论文部分内容阅读
维吾尔语是黏着性语言,利用丰富的词缀可以用同样的词干产生超大词汇,给维吾尔语语音识别的研究工作带来了很大困难。结合维吾尔语自身特点,建立了维吾尔语连续语音语料库,利用HTK(HMMToolK it)工具实现了基于隐马尔可夫模型(HMM)的维吾尔语连续语音识别系统。在声学层,选取三音子作为基本的识别单元,建立了维吾尔语的三音子声学模型,并使用决策树、三音子绑定、修补哑音、增加高斯混合分量等方法提高模型的识别精度。在语言层,使用了适合于维吾尔语语音特征的基于统计的二元文法语言模型。最后,利用该系统进行了维吾尔语连续语音识别实验。
Uyghur language is a sticky language, the use of rich affixes can use the same stem to produce oversized words, Uygur language speech recognition research has brought great difficulties. Uyghur language continuous speech corpus is established according to the characteristics of Uyghur language. The Uyghur continuous speech recognition system based on Hidden Markov Model (HMM) is implemented by HTK (HMM Tool). At the acoustical level, the triphone is selected as the basic recognition unit, and a three-tone Uyghur language acoustic model is established. The model is identified by using decision tree, triphone binding, mending mute, and adding Gaussian mixture components Accuracy. At the language level, a statistical-based binary grammar language model suitable for Uyghur phonetic features is used. Finally, the Uyghur continuous speech recognition experiment is carried out using this system.