论文部分内容阅读
针对网络中越来越多的语音数据,人们迫切地需要基于语义内容的快速、准确的语音检索技术。本文在基于音节Lattice的汉语语音检索研究中,针对传统的向量空间模型检索方法的不足,提出了一种基于词检出实现的语音检索方法。并针对Lattice索引存在的信息冗余问题,提出了一种基于音节后验概率直方图的索引去冗余方法。实验结果表明,本文的检索方法在性能上明显优于向量空间模型方法;而提出的索引去冗余方法达到了大规模缩减索引尺寸加快检索速度的目的。
For the more and more voice data in the network, people urgently need fast and accurate voice retrieval technology based on semantic content. In this paper, based on the syllable Lattice Chinese speech retrieval research, aiming at the deficiency of traditional vector space model retrieval method, a speech retrieval method based on word detection is proposed. Aiming at the problem of information redundancy in Lattice index, an index de-redundancy method based on histogram of posterior probability of syllables is proposed. The experimental results show that the retrieval method in this paper is superior to the vector space model in performance obviously. The proposed index de-duplication method achieves the goal of reducing the index size on a large scale and accelerating the retrieval speed.