论文部分内容阅读
龙芯3B处理器是首款国产商用8核处理器,主要用于高性能计算机、高性能服务器和数字信号处理等领域.因此充分利用龙芯3B体系结构,开发一套高效的FFT库则尤为重要.FFTW库是基于通用CPU开发的软件包,很难充分利用龙芯3B处理器的硬件特性,从而在龙芯3B处理器上未能取得令人满意的性能.针对该问题本文采用MIPS汇编、乘加指令、向量化计算、Cooley-Tukey算法和实数类型实部虚部分开计算等多种优化方法对FFTW库进行优化.使用离散傅里叶通用的benchmark测试工具benchfft进行性能测试,实验结果表明,优化后比优化前性能平均提升45%左右,部分甚至超过100%,使FFTW在龙芯3B处理器上具有较高的性能.
Loongson 3B processor is the first domestic commercial 8-core processor, mainly used in high-performance computers, high-performance servers and digital signal processing and other fields.Therefore, the full use of Loongson 3B architecture to develop an efficient FFT library is particularly important. FFTW library is based on the development of general-purpose CPU software package, it is difficult to take full advantage of the hardware features of Godson-3B processor, and thus failed to achieve satisfactory performance on the Godson-3B processor.This paper uses MIPS assembly, multiply and add instructions , Vector quantization, Cooley-Tukey algorithm and real number imaginary part of the real part to optimize the FFTW library optimization using discrete Fourier Fourier bench bench benchmark for performance testing, the experimental results show that, after optimization Compared with the pre-optimization performance increased by about 45% on average, some even more than 100%, so FFTW Godson 3B processor with high performance.