论文部分内容阅读
提出了一种语音处理和语言处理按帧同步统合的汉语连续语音识别方法。该方法把基于 CFG语言模型和 Top Down型句法分析器的语言处理过程结合进基于有限状态自动机控制的 One Pass Viterbi语音识别算法中,实现了帧同步的语音语言处理的统合。为完成帧同步句法分析的单词预测和语音识别过程的结合,本文提出了一种类似于Earley法的 TopDown型句法分析方法以及 One Pass Viterbi算法中的有限状态自动机动态展开建立法. 60个音素单位和 8个声调单位的 HMM作为识别用基元模型被用于识别实验,识别结果表明,对于一个识别困难度(Perplexity)为27.3的任务(Task)的识别系统,利用本文提出的方法,10名话者发音的 1070句子的平均识别率达到 94.4%,比利用传统的基于单词确认(Word Spotting)以及从单词串(列)(lattice)进行句法分析的阶层性语音·语言统合方式的识别率提高约8%.
This paper proposes a Chinese continuous speech recognition method based on the frame synchronization of speech processing and language processing. The method combines the language processing based on CFG language model and Top Down parser into the One Pass Viterbi speech recognition algorithm based on finite state automaton control, and realizes the integration of the speech processing of frame synchronization. In order to achieve the combination of word prediction and speech recognition process in frame synchronization syntax analysis, this paper proposes a TopDown syntax analysis method similar to Earley method and a finite state automata dynamic expansion method in One Pass Viterbi algorithm. The HMMs of 60 phoneme units and 8 tone units were used as identification primitive models for recognition experiments. The recognition results show that for a task recognition system with a recognition difficulty (Perplexity) of 27.3, According to the proposed method, the average recognition rate of 1070 sentences pronounced by 10 speakers reaches 94.4%, which is higher than that of traditional hierarchical recognition based on word spotting and syntactic analysis from the word lattice · Recognition rate of language integration increased by about 8%.