论文部分内容阅读
提出一种基于感知加权线谱对(Line Spectral Pair,LSP)距离的最小生成误差(Minimum Generation Error,MGE)模型训练方法,用以改善基于隐马尔科夫模型的参数语音合成系统性能.在采用线谱对参数表征语音频谱特征时,传统MGE训练中使用的欧氏距离生成误差计算方法并不能较好地反映生成频谱与自然频谱之间的真实距离,而采用与谱参数无关的对数谱间距(Log Spectral Distortion,LSD)定义的生成误差函数可改善这一问题,但改进后主观效果不明显,且运算复杂度很高.文中先提出基于加权LSP距离的MGE模型训练方法,并在实验中从主客观对比不同加权方法以及基于LSD的MGE训练.最后,找到一种感知加权方法,不但具有较好的主观表现,而且在运算复杂度上与传统MGE训练相比几乎没有增加.
This paper proposes a minimum generation error (MGE) training method based on Distance Spectral Pair (LSP) distance to improve the performance of parametric speech synthesis system based on Hidden Markov Model. Line spectral parameters for characterizing speech spectral features, the traditional Euclidean distance used in training MGE training error calculation method does not reflect the real distance between the generated spectrum and the natural frequency spectrum, and spectral parameters have nothing to do with the spectrum The generation error function defined by Log Spectral Distortion (LSD) can improve this problem, but the improved subjective effect is not obvious and the computational complexity is very high. In this paper, the MGE model training method based on weighted LSP distance is proposed first, We compare the different weighting methods from subjective and objective, as well as MGE training based on LSD.Finally, find a perceptual weighting method not only with better subjective performance, but also with almost no increase in computational complexity compared with the traditional MGE training.