论文部分内容阅读
对近年来在电力系统中出现的大规模数据流进行了探讨,目的是利用流式计算技术提高系统的实时性和安全性。针对大规模用电信息采集中用电数据流的快速聚类和异常检测技术展开研究。结合分布式流式计算平台Spark Streaming,基于用电行为在纵向时间和横向空间上表现出的聚类特性,即同类用户具有相似用电模式和同一用户历史数据具有相似性,设计并实现了流式DBSCAN聚类算法,以实现对大规模用电数据流的快速异常检测。设计并搭建了支持大规模数据流处理的实验环境,证明了算法的有效性。
In recent years, large-scale data flow appeared in the power system was discussed in order to improve the real-time and safety of the system by using the flow computing technology. Aiming at the rapid clustering and anomaly detection technology of electricity data flow in large-scale electricity information acquisition, Combined with the distributed streaming computing platform Spark Streaming, based on the clustering characteristics of power consumption behavior in vertical and horizontal space, similar users have similar power consumption patterns and historical data of the same user has similarities, the design and implementation of the flow DBSCAN clustering algorithm to achieve rapid anomaly detection of large-scale electricity consumption data flow. Design and set up an experimental environment that supports large-scale data stream processing, and prove the effectiveness of the algorithm.