Using Memory in the Right Way to Accelerate Big Data Processing

来源 :计算机科学技术学报（英文版） | 被引量 : 0次 | 上传用户：kuxinghuajia

【摘要】

：

Big data processing is becoming a standout part of data center computation. However, latest research has indicated that big data workloads cannot make full use

【作者】

：

阎栋尹绪森连城钟翔周鑫吴甘沙

【出处】

：

计算机科学技术学报（英文版）

【发表日期】

：

2004年期

【关键词】

：

big data key/value pair architecture awareness performance measurement

下载到本地 , 更方便阅读

下载此文赞助VIP

声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架

论文部分内容阅读

Big data processing is becoming a standout part of data center computation. However, latest research has indicated that big data workloads cannot make full use of modern memory systems. We find that the dramatic ine?ciency of the big data processing is from the enormous amount of cache misses and stalls of the depended memory accesses. In this paper, we introduce two optimizations to tackle these problems. The first one is the slice-and-merge strategy, which reduces the cache miss rate of the sort procedure. The second optimization is direct-memory-access, which reforms the data structure used in key/value storage. These optimizations are evaluated with both micro-benchmarks and the real-world benchmark HiBench. The results of our micro-benchmarks clearly demonstrate the effectiveness of our optimizations in terms of hardware event counts; and the additional results of HiBench show the 1.21X average speedup on the application-level. Both results illustrate that careful hardware/software co-design will improve the memory e?ciency of big data processing. Our work has already been integrated into Intel distribution for Apache Hadoop.

其他文献

关于电费预收工作的管理分析

随着国家电网“三集五大”改革的不断深入，我国电费收缴也发生了很大的变化，由传统的用电后交费方式开始转变为预存电费方式，这标志着我国电力市场越来越完善。但是由于受到一些

期刊

电费预收工作管理问题对策

供电企业电力设施保护法律问题及防范探析

电力设施作为供电企业电力系统运行的，还是用电、供电、输电以及发电必不可少的物质基础，损坏任何一个部分都将会导致电力供应和使用出现中断，对国民经济的稳定有序发展以及社会

期刊

供电企业电力设施保护法律问题防范措施

如何实现营配信息融合的方法分析及深化应用

实践证明，通过营配信息的实现，能够使营销业务系统数据和生产业务系统数据完成共享。本文笔者在分析营配信息融合的业务需求及数据源规则的基础上，进一步对营配信息融合的实现方

期刊

营销信息融合方法

Bipartite-Oriented Distributed Graph Partitioning for Big Learning

Many machine learning and data mining (MLDM) problems like recommendation, topic modeling, and medical diagnosis can be modeled as computing on bipartite graphs

期刊

bipartite graphgraph partitioninggraph-parallel system

An E?cient and Flexible Deterministic Framework for Multithreaded Programs

Determinism is very useful to multithreaded programs in debugging, testing, etc. Many deterministic ap-proaches have been proposed, such as deterministic multit

期刊

determinismmultithreadingframeworkflexible

浅析电力营销存在的问题与精细化策略

随着经济的快速发展，我国的电力企业得到了快速的发展，而用户对供电质量和供电服务的要求也越来越高。为提高电力企业的市场竞争力，电力企业必须加强电力营销的精细化管理，确保电

期刊

电力营销精细化管理

System-Enforced Deterministic Streaming for E?cient Pipeline Parallelism

Pipeline parallelism is a popular parallel programming pattern for emerging applications. However, program-ming pipelines directly on conventional multithreaded

期刊

deterministic parallelismpipeline parallelismsingle-producer/multi-consumervi

如何加强电力企业的抄表和电费回收管理工作

随着我国电力市场需求的快速发展，供电单位引进了自动抄表技术，不仅提高了抄表的准确率，而且还实现了远程抄表。本文针对电力企业电费回收工作所遇到的问题，有针对性地提出改进措

期刊

电力企业电费回收抄表电力计量管理

CUDA-NP：Realizing Nested Thread-Level Parallelism in GPGPU Applications

Parallel programs consist of series of code sections with different thread-level parallelism (TLP). As a result, it is rather common that a thread in a parallel

期刊

GPGPUnested parallelismcompilerlocal memory

CRAIS：A Crossbar-Based Interconnection Scheme on FPGA for Big Data

On-chip interconnection has posed significant challenges in multiprocessor system on chip (MPSoC) design paradigm, especially in big data era. With respect to t

期刊

interconnectbig datacrossbarmultiprocessor system on chip

Using Memory in the Right Way to Accelerate Big Data Processing

与本文相关的学术论文