,Improving performance portability for GPU-specific OpenCL kernels on multi-core/many-core CPUs by a

来源 :浙江大学学报（英文版）（C辑：计算机与电子） | 被引量 : 0次 | 上传用户：xdjxbzz

【摘要】

：

OpenCL is an open heterogeneous programming framework. Although OpenCL programs are func-tionally portable, they do not provide performance portability, so code

【作者】

：

Mei WEN Da-fei HUANG Chang-qing XUN Dong CHEN

【机构】

：

School of Computer, National University of Defense Technology, Changsha 410073, China

【出处】

：

浙江大学学报（英文版）（C辑：计算机与电子）

【发表日期】

：

2015年11期

【关键词】

：

OpenCL Performance portability Multi-core/many-core CPU Analysis-based transform

下载到本地 , 更方便阅读

下载此文赞助VIP

声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架

论文部分内容阅读

OpenCL is an open heterogeneous programming framework. Although OpenCL programs are func-tionally portable, they do not provide performance portability, so code transformation often plays an irreplaceable role. When adapting GPU-specifi c OpenCL keels to run on multi-core/many-core CPUs, coarsening the thread granularity is necessary and thus has been extensively used. However, locality conces exposed in GPU-specifi c OpenCL code are usually inherited without analysis, which may give side-effects on the CPU performance. Typi-cally, the use of OpenCL’s local memory on multi-core/many-core CPUs may lead to an opposite performance effect, because local-memory arrays no longer match well with the hardware and the associated synchronizations are costly. To solve this dilemma, we actively analyze the memory access pattes using array-access descriptors derived from GPU-specifi c keels, which can thus be adapted for CPUs by (1) removing all the unwanted local-memory arrays together with the obsolete barrier statements and (2) optimizing the coalesced keel code with vectorization and locality re-exploitation. Moreover, we have developed an automated tool chain that makes this transformation of GPU-specifi c OpenCL keels into a CPU-friendly form, which is accompanied with a scheduler that forms a new OpenCL runtime. Experiments show that the automated transformation can improve OpenCL keel performance on a multi-core CPU by an average factor of 3.24. Satisfactory performance improvements are also achieved on Intel’s many-integrated-core coprocessor. The resultant performance on both architectures is better than or comparable with the corresponding OpenMP performance.

其他文献

水稻条纹叶枯病抗性基因的定位研究

水稻条纹叶枯病(Ricestripe Disease)是由水稻条纹病毒(Ricestripe Virus，RSV)引起的水稻病毒病，是通过灰飞虱刺吸以持久方式传播，汁液、土壤及种子均不能传毒。它最早于1903年在日本发现，朝鲜、前苏联也有分布(洪剑鸣等，1984)。我国最早发生于1964年，70年代较轻，80年代中后期上升，近年来在江苏等地发生尤为严重，不少受害地区出现绝收。RSV可侵染水稻、小麦

学位

水稻条纹叶枯病抗性基因QTL分析

An intuitive general rank-based correlation coefficient

Correlation analysis is an effective mechanism for studying patts in data and making predictions.Many interesting discoveries have been made by formulating corr

期刊

General rank-based correlation coefficientMultivariate analysisPredictive metr

师范生教育改革初探

随着全民素质不断提升，当今社会对中小学教师专业化、职业化水平的要求越来越高，传统的师范生培养模式已不能满足社会需求。当今高校师范生教育存在师范生文化缺乏、教学质量下

期刊

师范生评价标准测评体系

我的新闻意识

这些年主编《南口厂报》的实践,越发使我感到新闻意识强弱,对于办好一张企业报是至关重要的. 新闻意识至少可归纳为“三个较强”。即,决断能力较强;贴近能力较强;竞争能力较

期刊

新闻意识企业报党委机关报竞争能力宣传报道机车车辆工业职业道德规范产品创优经济承包张广

旱地玉米/花生不同种植模式对生物炭输入的响应及其机制

由生物质热解而成的生物炭(Biochar)农用作为土壤改良剂、肥料缓释载体及碳封存剂备受世人关注。然而，豆科/禾本科间作体系作为我国重要的农业生产模式之一，对生物碳的输入响应

学位

玉米花生旱地土壤种植模式生物炭

社会转型阈境下的民办高校师德建设探析

近年来，民办高校在中国发展迅速，但是由于民办高校办学时间、办学经验和办学资源有限，使得民办高校教师队伍的师德建设出现了一些问题。传统内外兼修，教书育人德育先行的师德建设

期刊

社会转型民办高校师德建设

玉米饲用营养品质类型特性与调控技术研究

该试验于1998-1999年在山东农业大学教学基地农场进行.采用大田试验对内分析相结合的技术路线,应用比较的方法研究了不同品质类型玉米的饲用营养价值及其生理特性,并研究了种

学位

玉米饲用营养价值品质类型调控技术生理特性

,Posture control of a 3-RPS pneumatic parallel platform with parameter initialization and an adaptiv

A control algorithm for a 3-RPS parallel platform driven by pneumatic cylinders is discussed.All cylinders are controlled by proportional directional valves whi

期刊

Parameter initializationAdaptive robust controlParallel mechanismPneumatic cy

,Conceptual model of real-time IoT systems

We address a special kind of Inteet of Things (IoT) systems that are also real-time. We call them real-time IoT (RT-IoT) systems. An RT-IoT system needs to meet

期刊

Internet of Things (IoT)Real-time systemConceptual modelViewHard/Soft real-t

,Detection of Laser-Produced Tin Plasma Emission Lines in Atmospheric Environment by Optical Emissio

A spectroscopic study on laser-produced tin plasma utilizing the optical emission spectroscopy (OES) technique is presented.Plasma is produced from a solid tin

期刊

Tin plasmalaser-produced plasmaelectron densityelectron temperature

,Improving performance portability for GPU-specific OpenCL kernels on multi-core/many-core CPUs by a

与本文相关的学术论文