论文部分内容阅读
基于模板的机器翻译方法在专利文本的自动翻译中具有广泛的应用。由于专利文本的术语繁多,并且分布极不均匀,因此在使用统计方法获取翻译模板时通常会产生严重的数据稀疏问题。本文提出一种专利文本的单语模板的自动获取方法,这种方法充分利用了专利文本中词汇分布极不均匀的特点,通过统计词汇出现的频率来识别出模板的固定部分和可泛化部分。实验结果证明,这种方法在专利文本的模板的自动获取上具有非常好的性能。
Template-based machine translation methods have a wide range of applications in automatic translation of patent documents. Due to the variety of terms and the uneven distribution of patent documents, there are often serious data sparseness problems when using statistical methods to obtain translation templates. In this paper, a method of automatic acquisition of monolingual template of patent text is proposed. This method makes full use of the feature of extremely uneven word distribution in patent texts, and identifies the fixed part and the generalizable part of the template by counting the appearance frequencies of the words . The experimental results show that this method has a very good performance in the automatic acquisition of the template of the patent text.