论文部分内容阅读
中文语音检索系统用于快速准确地在中文语音文档中定位用户查询.典型实现方案对语音文档进行识别后建立索引,对查询串进行分词并以分词结果检索.检索过程中出现的查询分词与识别结果不匹配将影响系统性能.为解决该问题,产生多种查询分词结果,并对其进行前后缀扩展后检索.为解决因扩展带来的检索内容过多,用时较长的问题,引入有穷自动机压缩扩展,在此基础上设计基于令牌的搜索算法高效检索.实验证明,对查询的多分词与前后缀扩展可以使检索EER相对提升50%~70%,引入FSA可压缩检索空间,使得检索速度提升近30倍.
Chinese speech retrieval system is used to locate user query quickly and accurately in Chinese speech documents.A typical implementation scheme is to identify speech documents and index them, to segment the query strings and to retrieve the word segmentation results. Query segmentation and recognition in the retrieval process In order to solve this problem, a variety of query segmentation results are generated, and the prefixes and suffixes are extended to retrieve them. In order to solve the problem of excessively long retrieval time due to the excessive retrieval content, Based on which the search algorithm based on tokens is efficiently searched.The experiment proves that the multi-word query and the prefix-suffix expansion can make the retrieval EER relative increase 50% ~ 70%, and introduce FSA compressible search space , Making retrieval speed nearly 30 times.