Using Memory in the Right Way to Accelerate Big Data Processing

来源 :计算机科学技术学报(英文版) | 被引量 : 0次 | 上传用户:kuxinghuajia
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
Big data processing is becoming a standout part of data center computation. However, latest research has indicated that big data workloads cannot make full use of modern memory systems. We find that the dramatic ine?ciency of the big data processing is from the enormous amount of cache misses and stalls of the depended memory accesses. In this paper, we introduce two optimizations to tackle these problems. The first one is the slice-and-merge strategy, which reduces the cache miss rate of the sort procedure. The second optimization is direct-memory-access, which reforms the data structure used in key/value storage. These optimizations are evaluated with both micro-benchmarks and the real-world benchmark HiBench. The results of our micro-benchmarks clearly demonstrate the effectiveness of our optimizations in terms of hardware event counts; and the additional results of HiBench show the 1.21X average speedup on the application-level. Both results illustrate that careful hardware/software co-design will improve the memory e?ciency of big data processing. Our work has already been integrated into Intel distribution for Apache Hadoop.
其他文献
随着国家电网“三集五大”改革的不断深入,我国电费收缴也发生了很大的变化,由传统的用电后交费方式开始转变为预存电费方式,这标志着我国电力市场越来越完善。但是由于受到一些
电力设施作为供电企业电力系统运行的,还是用电、供电、输电以及发电必不可少的物质基础,损坏任何一个部分都将会导致电力供应和使用出现中断,对国民经济的稳定有序发展以及社会
实践证明,通过营配信息的实现,能够使营销业务系统数据和生产业务系统数据完成共享。本文笔者在分析营配信息融合的业务需求及数据源规则的基础上,进一步对营配信息融合的实现方
Many machine learning and data mining (MLDM) problems like recommendation, topic modeling, and medical diagnosis can be modeled as computing on bipartite graphs
Determinism is very useful to multithreaded programs in debugging, testing, etc. Many deterministic ap-proaches have been proposed, such as deterministic multit
随着经济的快速发展,我国的电力企业得到了快速的发展,而用户对供电质量和供电服务的要求也越来越高。为提高电力企业的市场竞争力,电力企业必须加强电力营销的精细化管理,确保电
Pipeline parallelism is a popular parallel programming pattern for emerging applications. However, program-ming pipelines directly on conventional multithreaded
随着我国电力市场需求的快速发展,供电单位引进了自动抄表技术,不仅提高了抄表的准确率,而且还实现了远程抄表。本文针对电力企业电费回收工作所遇到的问题,有针对性地提出改进措
Parallel programs consist of series of code sections with different thread-level parallelism (TLP). As a result, it is rather common that a thread in a parallel
On-chip interconnection has posed significant challenges in multiprocessor system on chip (MPSoC) design paradigm, especially in big data era. With respect to t