基于统计的汉语到基里尔蒙古语的机器翻译系统

来源 :内蒙古师范大学 | 被引量 : 0次 | 上传用户:zjh73
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
Nowadays computers have become the most important tools in many aspects of human life. Machine translation, the automation of the translation process by computers, has developed at an unprecedented pace and people may even think that with this new technology "old style" human translations would become obsolete. Following it, many developed countries paying their attention in this field of study. Companies like Google, MSN and Yahoo provide translation services on their websites, generating translations based on statistical method. And statistical machine translation has become dominated in machine translation field. Statistical machine translation has evolved from the word-based level to higher levels of abstraction. Currently the best known systems are phrased-based. and recent research has started to explore tree-based systems with syntactical information.The aim of this thesis is to create a Cyrillic Mongolian corpus and build a Chinese-Cyrillic Mongolian statistical machine translation system. At first, this thesis discusses the relative theories of statistical machine translation and the methods and tools which have used on creating Cyrillic Mongolian corpus. And this thesis shows the result of Chinese-Cyrillic Mongolian statistical machine translation system which has been based on phrase-based translation models.We established a development set and test set. and then obtained a certain amount of Chinese and Cyrillic Mongolian parallel corpus through the two approaches of collecting and establishing. Based on this basis, we did some experiments on the Chinese-Cyrillic Mongolian statistical system. In which, the translation model uses the Chinese-Cyrillic Mongolian parallel corpus that has more than60thousand pairs of sentences, After conducting open testing to the training corpus that has400pairs of sentences, we obtained the evaluation result that BLUE and NIST values are0.1489and4.7232on3-gram,0.1381and4.9333on4-gram,0.1194and4.3772on5-gram respectively.
其他文献
近年来,随着自带设备办公(BYOD)的普及,企业员工们倾向于使用个人移动设备访问公司资源。同一个设备同时用于访问企业数据和个人数据引入了新的安全威胁,例如企业机密数据的
学位
二十一世纪是信息时代,随着计算机科学技术的发展和通信技术的发展计算机网络规模日渐壮大起来,网络已走进人们的工作,生活,娱乐和学习中。但是,科学技术永远都是一面双刃剑,总会有
物联网产业的蓬勃兴起掀起了世界信息产业新的发展浪潮,而无线传感器网络作为一种多学科高度交叉、知识高度集成的新技术,存新一代网络中扮演着特别重要的角色,并成为当前的
云计算是互联网产业中用户和企业需求驱动的产物,以服务按需付费为特点,为用户提供更为高效便捷的服务。随着云计算的高速发展,云存储也因其高扩展性、高可靠性和低成本的特性受
随着现代科学技术的不断发展,越来越多的领域运用到了计算机视觉图像处理的技术。其中,视频目标跟踪是一个新兴的研究方向,它融合了多种高级的科学技术,诸如人工智能、模式识别以
BWDSP100是一款国内近期开发的高性能数字信号处理器,本文所论述的工作是以Openimpact为编译基础架构,为BWDSP100实现调试信息的生成和复数乘法操作的优化。   基于编译基础
免疫水印是近些年来在传统数字水印的框架基础上提出的算法模型,它不同于普通水印,最后得到的公布图像具有免疫性和自恢复性,并且可以对嵌入的自恢复信息进行加密处理,在版权保护
当今社会互联网技术已经得到广泛运用,这就带动了电子商务现今的高速发展,同时也导致了Internet中的资源数量以几何数级在快速增长。“信息爆炸”和“信息过载”使得人们在面对
随着经济社会的发展,各行各业对软件的需求和依赖程度在逐渐增强,与此同时,软件安全问题日益突出,特别是在一些安全攸关的领域中,软件的可靠性变得十分必要。提高软件可信程度的方