论文部分内容阅读
面向情感语音转换,该文提出了一种韵律转换方法。该方法包含基频转换和时长转换两个部分,前者选择离散余弦变换(DCT)参数化基频,根据基频的层次结构特点,将基频分解为短语层和音节层两个层次,使用基于混合高斯模型(GMM)的转换方法对两个层次分别进行转换;后者使用基于分类回归树(CART)的方法以声韵母为基本单位对时长进行转换。一个包含三种基本情感的语料库用作训练和测试,客观评测以及主观评测实验结果显示该方法可有效进行情感韵律转换,其中悲伤情感在主观实验中达到了接近100%的正确率。
For emotional speech conversion, this paper proposes a prosody conversion method. The method includes the fundamental frequency conversion and the time-length conversion. The former chooses discrete cosine transform (DCT) to parameterize the fundamental frequency, and decomposes the fundamental frequency into the two levels of the phrase layer and the syllable layer according to the hierarchical structure of the fundamental frequency. The GMM transformation method transforms the two levels respectively; the latter uses the CET method to convert the duration using the vowels as the basic unit. A corpus containing three basic sentiments is used for training and testing. The objective evaluation and subjective evaluation of the corpus show that this method can effectively convert the prosody of emotions, and sadness and emotion achieve nearly 100% accuracy in subjective experiments.