论文部分内容阅读
维吾尔语是一种黏着语,单词不太适合作为维吾尔语大词汇连续语音识别系统识别单元。针对维吾尔语大词汇连续语音识别系统中的识别单元选择问题,设计更适合维吾尔语的子词识别单元,提出维吾尔语单词和子词相结合的组合识别单元构建方法,并对单词、子词和组合识别单元的语言模型和语音识别性能进行评价。实验结果表明,所提出的识别单元在单元数量、语言模型复杂度等方面表现出更加优越的性能,并且使识别系统的单词错误率比基于单词的系统相对减少22%。
Uyghur language is a cohesive word, the word is not suitable as a Uyghur vocabulary recognition unit for continuous speech recognition system. Aiming at the selection of recognition units in the continuous speech recognition system of Uyghur Great Vocabulary, a more suitable sub-word recognition unit in Uyghur language is designed, and the method of construction of combined recognition unit combining Uyghur words and subwords is proposed. The word, sub-word and combination Recognition unit language model and speech recognition performance evaluation. The experimental results show that the proposed identification unit shows more superior performance in terms of the number of units, the complexity of the language model, etc., and reduces the recognition system’s word error rate by 22% relative to the word-based system.