论文部分内容阅读
目前世界范围内,尤其是人口较少的民众失去甚至失传母语和传统文化的现象日益严重,拯救濒危文化的需求很迫切。由于蒙古文的地区性,多样性以及方言性,至今在地区之间文字通讯较难。再加上过去一些地区使用文字的频繁改动,民众失去了拥有的语言和文字。不论在使用文字标准的研究、新-旧文字信息的互换、自然语言处理-通信等方面存在着许多亟待解决的问题。我们在日本国早稻田大学,NICT~1、GITI~2、蒙古国国立大学、俄国卡尔梅克/布里亚特国立大学、中国内蒙古社会科学院蒙科立软件开发公司等单位的协助下收集整理了以蒙语为核心的多语言信知识库。其内容包括:①蒙古语族语言语音库;②新蒙/传统/托忒蒙文/中/英/日/韩-平行语法标注电子词典;③蒙古国方言教育旅游会话语音库以及卫拉特蒙古江格尔说唱语料(中国新疆,俄国卡尔梅克)。以上语料均用Chasen/HTK/ATRASR等常用软件进行自动标注、人工校对、实现了共享平台。借助于以上的语言资源我们研发了应用软件:既,1)蒙文多文种文本横向转换处理软件(实测转换率94.3%)、2)语音-文本转换软件(实测转换率88.6%)。其中一部分的语料和软件已经向社会公开。这对蒙古语言的数据挖掘、知识发现,科学研究以及再生-学习消失中的语言-文字等方面无非是一个科学性的扶持。
At present, people in the world, especially the less populated, have become increasingly desperate to lose or even lose their mother tongue and traditional culture, and the need to rescue an endangered culture is urgent. Due to the regional, diversity and dialect nature of Mongolian languages, text communication between regions has so far been difficult. Coupled with the frequent changes in the past use of the text in some areas, people lose their own language and text. There are many problems to be solved in terms of the study of literal standards, the exchange of new-old writing information, natural language processing and communication. We collected and collaborate with Waseda University in Japan, NICT ~ 1, GITI ~ 2, Mongolia State University, Kalmykia / Buryat State University in Russia, and Mongolian Software Development Company in Inner Mongolia Academy of Social Sciences Multilanguage Letter Knowledge Base with Mongolian as its Core. Its content includes: 1) Mongolian language speech database; 2) New Mongolian / traditional / Toddy Mongolian / Chinese / English / Japanese / Korean - parallel grammar electronic dictionary; 3) Mongolian dialect education tourism conversational speech database and Verat Mongolian grid Seoul rap material (Xinjiang, China, Kalmyk, Russia). The above corpus is automatically labeled with commonly used software such as Chasen / HTK / ATRASR, and manually calibrated to achieve a shared platform. With the aid of the above language resources, we have developed application software: 1) Mongolian multilingual text horizontal conversion software (measured conversion rate of 94.3%), 2) speech-to-text conversion software (measured conversion rate of 88.6%). Some of the corpus and software have been made public. This is nothing more than a scientific support for Mongolian data mining, knowledge discovery, scientific research and reproduction - the disappearance of words and words in learning.