论文部分内容阅读
The combination of growing transistor counts and limited power budget within a silicon die leads to the utilization wall problem(a.k.a. “Dark Silicon”), that is only a small fraction of chip can run at full speed during a period of time. Designing accelerators for specific applications or algorithms is considered to be one of the most promising approaches to improving energy-efficiency. However, most current design methods for accelerators are dedicated for certain applications or algorithms, which greatly constrains their applicability. In this paper, we propose a novel general-purpose many-accelerator architecture. Our contributions are two-fold. Firstly, we propose to cluster dataflow graphs(DFGs) of hotspot basic blocks(BBs) in applications. The DFG clusters are then used for accelerators design. This is because a DFG is the largest program unit which is not specific to a certain application. We analyze 17 benchmarks in SPEC CPU 2006,acquire over 300 DFGs hotspots by using LLVM compiler tool, and divide them into 15 clusters based on graph similarity.Secondly, we introduce a function instruction set architecture(FISC) and illustrate how DFG accelerators can be integrated with a processor core and how they can be used by applications. Our results show that the proposed DFG clustering and FISC design can speed up SPEC benchmarks 6.2X on average.
The combination of growing transistor counts and limited power budget within a silicon die leads to the utilization wall problem (aka “Dark Silicon”), that is only a small fraction of chip can run at full speed during a period of time. Designing accelerators for specific applications or algorithms is considered to be one of the most promising approaches to improving energy-efficiency. However, most current design methods for accelerators-of- Firstly, we propose to cluster dataflow graphs (DFGs) of hotspot basic blocks (BBs) in applications. This DFG clusters are then used for accelerators design. This is because a DFG is the largest program unit which is not specific to a certain application. We analyze 17 benchmarks in SPEC CPU 2006, acquire over 300 DFGs hotspots by using LLVM c ompiler tool, and divide them into 15 clusters based on graph similarity. Secondary, we introduce a function instruction set architecture (FISC) and illustrate how DFG accelerators can be integrated with a processor core and how they can be used by applications. that the proposed DFG clustering and FISC design can speed up SPEC benchmarks 6.2X on average.