论文部分内容阅读
为了改善耳语音转换中声道系统的转换性能,针对定值转换方法在非特定人耳语音转换系统中效果不理想的情况,提出使用通用背景模型建立独立于说话人的声道系统转换模型。进一步针对在通用背景模型中由于较大分量数产生的声学概率密度统计模型的误差问题,提出基于最小谱失真度的后验概率和有效高斯分量选择方法优化特征矢量的转换性能。定义了板仓一斋田谱失真测度的性能指标对该模型进行分析比较,实验表明,基于通用背景模型的转换特征矢量平均谱失真度性能指标优于定值偏移方法,且稳定性明显好于定值偏移方法。通用背景模型基础上有效高斯分量选择方法可进一步将性能指标提高5.11%,主观听觉测试表明本文方法可改善转换语音的清晰度和准确度。
In order to improve the conversion performance of aural system in ear-to-ear speech conversion, aiming at the non-ideal performance of the fixed-point conversion method in a non-specific human ear speech conversion system, a generic background model is proposed to establish a speaker-independent system conversion model. Furthermore, aiming at the error of the acoustic probability density statistical model due to the larger number of components in the general background model, the posterior probability and effective Gaussian component selection method based on the minimum spectral distortion is proposed to optimize the conversion performance of the eigenvector. The paper defines and compares the performance of Itabata spectral distortion measure of Itabata warehouse. The experimental results show that the average spectral distortion index of conversion feature vector based on general background model is better than the fixed value migration method, and the stability is better Method of setting offset. Based on the general background model, the effective Gaussian component selection method can further improve the performance index by 5.11%. Subjective auditory tests show that this method can improve the clarity and accuracy of converted speech.