HW/SW Co-optimization for Stencil Computation:Beginning with a Customizable Core

来源 :Tsinghua Science and Technology | 被引量 : 0次 | 上传用户:greenman
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
Energy efficiency is one of the most important issues for High Performance Computing(HPC) today.Heterogeneous HPC platform with some energy-efficient customizable cores(as application-specific accelerators)is believed as one of the promising solutions to meet ever-increasing computing needs and to overcome power density limitations. In this paper, we focus on using customizable processor cores to optimize the typical stencil computations—— the kernel of many high-performance applications. We develop a series of effective software/hardware co-optimization strategies to exploit the instruction-level and memory-computation parallelism,as well as to decrease the energy consumption. These optimizations include loop tiling, prefetching, cache customization, Single Instruction Multiple Data(SIMD), and Direct Memory Access(DMA), as well as necessary ISA extensions. Detailed tests of power-efficiency are given to evaluate the effect of all these optimizations comprehensively. The results are impressive: the combination of these optimizations has improved the application performance by 341% while the energy consumption has been decreased by 35%; a preliminary comparison with X86, GPU, and FPGA platforms also showed that the design could achieve an order of magnitude higher performance efficiency. We believe this work can help understand sources of inefficiency in general-purpose chips and can be used as a beginning to customize an energy efficient CMP for further improvement. Energy efficiency is one of the most important issues for High Performance Computing (HPC) today. Heterogeneous HPC platform with some energy-efficient customizable cores (as application-specific accelerators) is believed as one of the promising solutions to meet ever-increasing computing needs and to overcome power density limitations. In this paper, we focus on using customizable processor cores to optimize the typical stencil computations - the kernel of many high-performance applications. We develop a series of effective software / hardware co-optimization strategies to exploit the optimizations include loop tiling, prefetching, cache customization, Single Instruction Multiple Data (SIMD), and Direct Memory Access (DMA), as well as necessary ISA extensions. Detailed tests of power-efficiency are given to evaluate the effect of all these optimizations comprehensively. The results ar e impressive: the combination of these optimizations has improved the application performance by 341% while the energy consumption has been decreased by 35%; a preliminary comparison with X86, GPU, and FPGA platforms also showed that the design could achieve an order of magnitude higher performance efficiency. We believe this work can help understand sources of inefficiency in general-purpose chips and can be used as a beginning to customize an energy efficient CMP for further improvement.
其他文献
班组思想政治工作是企业思想政治工作的一部分,万不可缺。这是因为班组是企业的细胞,它不仅处于企业生产活动的第一线,而且还是企业思想政治工作的前哨阵地。因此,班组思想
作者于1970~1971年期间观察了因脑震荡住院的102例16岁以下儿童患者,其中有5例出现癫癎持续状态。他们过去未发现有癫癎病史,亦无癫癎的家族史,发作均在外伤后两小时内出现。
感染性休克是分布性休克的一种 ,血流动力学的表现以高心输出量、低外周血管阻力为特征。目前 ,对于感染性休克的治疗主要包括三个部分 :①保持足够的平均动脉压 ;②清除感染
李毅中在TD-SCDMA上对中移动曾指示:“只许成功,不许失败。”“我只看好水泥,除了水泥以外的其他商品都不看好。”摩根大通首席经济学家龚方雄认为,因为能源价格、煤炭价格继
单核细胞增多性李司忒菌(LM)系Gram(+)球菌,长约0.5~1.2μ,天然广泛存在于湿土、水、畜粪、草料中.经由病畜排泄物污染的食物可传染给人类,亦可经由创口、结合膜炎而感染.凡
眼下,投产仅一年多的“金威”、“好顺”啤酒正红遍神州大地,走俏东南亚及独联体市场,并且还大有漂洋过海,受宠于欧美市场之势。不少人在问,为什么新投产的企业会这样走红?
本文对118例癫癎大发作及精神运动型发作的患者,就抗惊癫药物的血清浓度与患者年龄、性别、药物剂量以及癫癎发作的频度和控制之间的关系进行了研究。 118例中,男81,女37。
请下载后查看,本文暂不支持在线获取查看简介。 Please download to view, this article does not support online access to view profile.
期刊
时下,由于生活节奏的加快,许多学前儿童的父母在子女教育上,显得缺乏耐心,不肯动脑筋,不愿下功夫,懒于去寻求适合于孩子特点的方法来引导孩子的健康成长。这就严重影响了孩
一、基本信息1、背景介绍茜茜(化名),女,13岁,小学五年级学生重组家庭孩子,身体健康,性格开朗。父亲是海员,无暇照顾家庭。茜茜虽从小与父母同住,但学习与生活起居均由母亲关