论文部分内容阅读
The countries of the world also develop the documents using many kinds of scripts in differentlanguages. Most countries use standard fonts for recognizing the typewritten materials, methodswhich recognize into computer text have been researched and many kinds of program fordigitizing the text had been designed. The issue of recognizing the typed and typewrittenmaterials by standard font is considered as fully decided problem On the contrary there are fewresearch works for the recognizing the Traditional Mongolian script. For digitizing theTraditional Mongolian script, the recognizing problem hasnt been fully decided yet and researchwork has being made till now.
Large amount of Mongolian printed documents need to be digitized in digital library andvarious applications Traditional Mongolian script has unique writing style and multi-font typevariations, which bring challenges to Mongolian OCR research. As traditional Mongolian scripthas some characteristics, for example, one character may be part of another character, we definethe character set for recognition according to the segmented components, and the components arecombined into characters by rule-based post-processing module. For character recognition amethod based on projection profile analysis,line segmentation and word segmentation ispresented. For character segmentation, a scheme is used to find the segmentation point byanalyzing the properties of projection and connected components. As Mongolian has differentfont-types which are categorized into two major groups, the parameter of segmentation isadjusted for each group. A font-type classification method for the two font-type group isintroduced. For recognition of Mongolian text mixed with Chinese and English, languageidentification and relevant character recognition kemels are integrated. Experiments show thatthe presented methods are effective. The text recognition rate is 90% on the test samples frompractical documents with multi-font-types and mixed scripts.