论文部分内容阅读
集成算法通过将多个分类器集成起来,能够有效地提高分类算法的预测精度。集成算法在煤炭开采、预测中也得到了广泛地应用。基于权重的集成算法是通过赋予不同分类器权重,进一步改进了集成算法的性能。但是,由于集成算法需要对多个分类器构建模型,随着数据规模的增大,传统的集成算法不能快速、有效地完成集成学习工作。本文针对煤炭领域中的大规模数据,提出了基于MapReduce分布式框架的分布式权重集成算法,该算法分布式完成集成的及预测工作。通过大量的实验结果进一步证明了本文提出的算法具有很高的效率以及很好的可扩展性。
By integrating multiple classifiers, the integrated algorithm can effectively improve the prediction accuracy of the classification algorithm. The integrated algorithm has also been widely used in coal mining and prediction. The weight-based integration algorithm further improves the performance of the integrated algorithm by giving different classifier weights. However, since the integration algorithm needs to construct a model for multiple classifiers, as the data size increases, the traditional integrated algorithms can not complete the integrated learning quickly and effectively. In this paper, a distributed weight integration algorithm based on MapReduce distributed framework is proposed for large-scale data in the field of coal. The algorithm performs distributed and forecasting work distributedly. The results of a large number of experiments further prove that the proposed algorithm has high efficiency and good scalability.