论文部分内容阅读
随着电力系统规模的增大、量测技术的发展与成本下降,电力系统的数据量呈现快速增长趋势,逐步具备了大数据特征。充分利用大数据来改善电力系统的规划、运行与控制已受到越来越广泛的重视,如何评估大数据的质量是一个值得研究的重要问题。在数据质量提高技术如数据清洗、数据整合、相似记录检测等方面,已有相当多的研究报道。然而,在数据质量评估方面的研究还处于起步阶段。在此背景下,针对电力系统特征和电力大数据质量特性,提出一种电力大数据质量综合评估方法。首先,构建电力大数据质量评估指标体系;接着,针对大数据处理的时效性问题,利用MapReduce并行化K-means聚类算法来实现大数据样本集的快速预处理。之后,利用熵权法计算各类数据集的客观权重,采用灰色评估法判断数据质量所属等级,在此基础上实现对样本数据集的综合评价。最后,以某市电力公司所采集的用户用电负荷数据为例对所提出的方法做了说明。
With the increase of power system scale and the development and cost reduction of measurement technology, the data volume of power system shows a rapid growth trend and gradually possesses big data features. How to make full use of big data to improve the planning, operation and control of power system has been paid more and more attention. How to evaluate the quality of big data is an important issue worth studying. There have been quite a few reports on data quality improvement techniques such as data cleaning, data integration, and similar record detection. However, research on data quality assessment is still in its infancy. Against this background, aiming at the characteristics of power system and the quality characteristics of big data, a comprehensive evaluation method of big data quality is proposed. First of all, construct the index system of power big data quality evaluation. Then, in view of the timeliness problem of big data processing, MapReduce parallel K-means clustering algorithm is used to realize rapid preprocessing of big data sample set. Afterwards, the objective weight of each data set is calculated by entropy method, and the grade of data quality is judged by gray evaluation method. On this basis, the comprehensive evaluation of the sample data set is realized. Finally, the method proposed by the city electricity company collected user load data as an example.