基于MR的高可靠分布式数据流统计模型

来源 :计算机技术与发展 | 被引量 : 0次 | 上传用户:slientlamb
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
结合流数据独有的特点,以数据流上基于窗口模型的连续分组统计为应用场景,结合现今主流的流数据处理平台Storm和Spark Streaming的优点,提出了一个高吞吐、低延迟、高可扩展性的分布式数据流统计模型Mars,解决由于流数据易失、时效性强造成的吞吐量压力大、数据延迟低等问题.在容错方面,Mars提供了at-least-once语义支持以防出现重大错误.采用真实实验环境对Mars进行测试,与目前流行的分布式流处理平台Spark Streaming和Storm相比,Mars对数据的实时性操作延迟介于二者之间,但就不同的集群规模而言,Mars的吞吐率明显优于二者1到2倍,就语义准确性而言,Mars实现了与Storm同级别的语义限制.,According to the unique characteristics of the data stream,with consecutive grouping statistics based on window model in the data flow as application scenarios,combined with the advantages of mainstream stream data processing platform like Storm and Spark Streaming, we propose a distributed statistical model of data stream with high throughput and scalability as well as low latency,namely Mars. It solves the problems of strong throughput and low latency due to losing data easily and strong timelessness. On the fault-tolerant,Mars provides at-least-once semantic support against major errors. It is tested in real experiment environment and made a comparison with the currently pop-ular distributed flow processing platform Spark Streaming and Storm,which show that it is between them in real-time operation delay for da-ta. However,in terms of the scale of the cluster,Mars’ throughput rate is significantly better than that of the two,and in terms of semantic accuracy,it achieves the semantic limits of the same level as Storm.
其他文献
很多管理人员参与了管理之后,往往都会心生诸多“人不好管”的感慨,发现曾经熟悉的诸多管理理论在实践过程中“处处碰壁”,甚至于发出“如果还有下辈子再也不想做管理”的悲
期刊
期刊
分别将银纳米相溶胶(银纳米颗粒、Ag@SiO_2核壳结构、银纳米线)掺入氧化铝异丙醇溶液中制成具有蜂窝结构的介孔层材料,然后在介孔层表面制备CH_3NH_3Pb I3钙钛矿吸收层得到Al
期刊
该文对目前我国奶片的生产现状、市场销售情况等进行了介绍,并对奶片的加工技术及如何控制质量问题作了阐述.
期刊