论文部分内容阅读
To efficiently exploit the performance of single instruction multiple data (SIMD) architectures for video coding,a parallel memory architecture with power-of-two memory modules is proposed. It employs two novel skewing schemes to provide conflict-flee access to adjacent elements (8-bit and 16-bit data types) or with power-of-two intervals in both horizontal and vertical dircctions,which were not possible in previous parallel memory architectures. Area consumptions and delay estimations are given respectively with 4,8 and 16 memory modules. Under a 0.18-μm CMOS technology,the synthcsis results show that the proposed system can achieve 230 MHz clock frequency with 16 memory modules at the cost of 19k gates when read and write latcncies are 3 and 2 clock cycles,respectively. We implement the proposed parallel memory architecture on a video signal proccssor (VSP). The results show that VSP enhanced with the proposed architecture achieves 1.28x speedups for H.264 real-time decoding.