论文部分内容阅读
对于基于Gauss混合模型-通用背景模型(Gaussianmixure model-universal background model,GMM-UBM)方法的文本无关说话人识别,当测试语音时长缩短到很短时,识别率会严重下降。为了充分利用文本内容信息,该文提出了一种基于K-top多音素类模型混合(KPCMMM)的建模方法。在音素识别阶段,利用语音识别得到训练语音的音素序列,在说话人识别阶段利用音素序列对每个说话人训练多个音素类模型,测试语音则在最相近的音素类模型上进行打分判决,K是选取的相近音素类数。由于音素类定义的不同,KPCMMM方法分为基于专家知识和数据驱动这两类。实验结果显示选择合适的K值可以得到更好的识别结果。不同的音素类定义方法的比较实验结果显示:当测试语音时长小于2s时,对比GMM-UBM基线系统,该方法的等错误率(EER)相对下降38.60%。
For text-independent speaker recognition based on the Gaussian mixture model-universal background model (GMM-UBM), the recognition rate will be severely degraded when the test speech duration is shortened. In order to make full use of the textual content information, a K-MMM-based modeling method is proposed in this paper. In the stage of phoneme recognition, the phoneme sequence of training speech is obtained by using speech recognition. At the stage of speaker recognition, multiple phoneme models are trained on each speaker using phoneme sequences. The test speech is scored on the nearest phoneme model. K is the number of similar phonemes selected. Due to the different phoneme class definitions, the KPCMMM method is divided into two categories based on expert knowledge and data driven. The experimental results show that better K value can be selected to get better recognition results. The comparison of different phoneme definition methods shows that the equivalent error rate (EER) of this method is reduced by 38.60% compared with GMM-UBM baseline when the test speech duration is less than 2s.