论文部分内容阅读
本文提出了一种基于词汇检索翻译对应句的方法。原文句子与译文句子并不在词汇级存在一一对应的关系,判断是否构成翻译关系也不需要认定所有的词都构成翻译对。本文提出了词语信息度(WI)的概念来反映词在句子中的重要性。词语信息度由词频、词在文档中的分布、词性、词的长度构成。判断是否构成翻译关系时,只关注信息度高的词汇是否构成翻译对。基于高信息度词汇翻译对构建了翻译对应句检索系统。实验表明,系统性能优于简单的基于所有词汇的翻译对应句检索方法,在噪声实验中,与相关研究对比表现了更好的强健性。
This paper presents a method of translating correspondence sentences based on word retrieval. There is not a one-to-one correspondence between the original sentence and the translated sentence at the lexical level, and it is not necessary to determine whether all the words form a translation pair or not. This paper presents the concept of word information (WI) to reflect the importance of words in sentences. Word information degree by the word frequency, word distribution in the document, part of speech, word length form. When deciding whether or not to form a translation relationship, we only focus on whether a highly informative vocabulary constitutes a translation pair. Based on high informational degree vocabularies translation, a translation correspondence sentence retrieval system is constructed. Experiments show that the performance of the system is superior to that of the simple method of translation correspondence search based on all words, which shows better robustness compared with related research in noise experiment.