论文部分内容阅读
统计参数语音合成中的跨语种模型自适应主要应用于目标说话人语种与源模型语种不同时,使用目标发音人少量语音数据快速构建具有其音色特征的源模型语种合成系统.本文对传统的基于音素映射和三音素模型的跨语种自适应方法进行改进,一方面通过结合数据挑选的音素映射方法以提高音素映射的可靠性,另一方面引入跨语种的韵律信息映射以弥补原有方法中三音素模型在韵律表征上的不足.在中英文跨语种模型自适应系统上的实验结果表明,改进后系统合成语音的自然度与相似度相对传统方法都有了明显提升.
The cross-language model adaptation in the statistical synthesis of speech parameters is mainly used when the target speaker’s language is different from that of the source model, and a small amount of speech data of the target speaker is used to quickly build a language model synthesis system of the source model with its timbre characteristics. Phoneme mapping and triphone model to improve the cross-language adaptive method, on the one hand through the combination of data-based phoneme mapping method to improve the reliability of phoneme mapping, on the other hand the introduction of prosodic information mapping across languages to make up for the original method of three Phonetic model is lack of rhythm characterization.Experimental results on the Chinese-English cross-language model adaptive system show that the naturalness and similarity of the system synthesized speech are improved obviously compared with the traditional methods.