OCR SYSTEM FOR Mongolian

来源 :内蒙古师范大学 | 被引量 : 0次 | 上传用户:ceolq
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
The countries of the world also develop the documents using many kinds of scripts in differentlanguages. Most countries use standard fonts for recognizing the typewritten materials, methodswhich recognize into computer text have been researched and many kinds of program fordigitizing the text had been designed. The issue of recognizing the typed and typewrittenmaterials by standard font is considered as fully decided problem On the contrary there are fewresearch works for the recognizing the Traditional Mongolian script. For digitizing theTraditional Mongolian script, the recognizing problem hasnt been fully decided yet and researchwork has being made till now.   Large amount of Mongolian printed documents need to be digitized in digital library andvarious applications Traditional Mongolian script has unique writing style and multi-font typevariations, which bring challenges to Mongolian OCR research. As traditional Mongolian scripthas some characteristics, for example, one character may be part of another character, we definethe character set for recognition according to the segmented components, and the components arecombined into characters by rule-based post-processing module. For character recognition amethod based on projection profile analysis,line segmentation and word segmentation ispresented. For character segmentation, a scheme is used to find the segmentation point byanalyzing the properties of projection and connected components. As Mongolian has differentfont-types which are categorized into two major groups, the parameter of segmentation isadjusted for each group. A font-type classification method for the two font-type group isintroduced. For recognition of Mongolian text mixed with Chinese and English, languageidentification and relevant character recognition kemels are integrated. Experiments show thatthe presented methods are effective. The text recognition rate is 90% on the test samples frompractical documents with multi-font-types and mixed scripts.
其他文献
近年来,随着网络的发展,现代人对信息沟通和处理的需求越来越迫切,这使得自然语言处理以及相关的应用领域得到了迅速的发展。而由于句法分析在自然语言处理研究中的关键地位及其
广域网文件传输速度慢是目前各大网络公司面临的问题,针对这些情况,探讨了采用重复数据消除技术减少数据传输量,从而提高网络传输速度。在实现重复数据消除的网络节点,探讨了
本文主要通过分析三网融合后故障管理的特点和智能故障管理现有的方法的优缺点,提出了一种改进的基于支持向量机的告警预测方法和一种增量的实时告警刷新方法,主要工作为:(1)
高效的查询执行效率是数据仓库管理系统开发时最大的关注点。最近的研究表明,底层数据以列存储的方式进行组织,能够更好地适应数据仓库管理系统面向查询的特征,能够在执行过
近年来,复杂网络得到了越来越多的关注。复杂网络研究的最终目标是为日常生活设计出鲁棒性高的系统。但是日常生活中的系统经常会遭受一些无意的破坏甚至恶意的攻击。因此,研究
目前,无线传感器网络由于其低功耗、低成本、分布式和自组织等优点在各种应用环境中都得到了使用,但是由于需要大范围的布置在露天或者野外环境中,传感器节点很容易受到攻击者挟
随着计算机和网络技术的发展和普及,人们对于计算和存储的需求越来越大,TB级别数据以及PB级别的数据需求已经越来越普遍。因此,基于计算机网络技术的分布式存储系统成为研究热点
全自主智能机器人要求信息处理和控制决策完全由板载芯片处理完成,所以大部分由嵌入式系统设计完成的,而全自主智能机器人采用的嵌入式系统的处理速度以及存储能力相对于普通PC
众所周知,随着互联网用户数量的急剧增加,互联网的核心技术 IPv4的缺陷已经暴露明显,如IP地址匮乏,安全性不足等,这些缺陷将严重制约未来互联网应用和规模的发展,互联网急需
传统机器学习方法从训练数据中学习得到的数据模型能够在测试数据中取得良好效果的前提是:有充足的训练数据且训练数据与测试数据同分布。然而,这种强约束性的前提往往难以得到