论文部分内容阅读
本文研究了构建现代维吾尔语语料库的关键技术与方法,特别是现代维吾尔语语料库的构建,并对现代维吾尔语语料预处理技术,现代维吾尔语语料统计技术,现代维吾尔语词干提取技术,现代维吾尔语数据分析技术进行了研究;研制了现代维吾尔语常用词候选表,从词语的使用频度和词语的分布两方面对词语进行了基本考察,将维吾尔语词语的“词种数、频次、频率、文本数、词长”作为常用词候选表的依据。
This paper studies the key technologies and methods of constructing modern Uyghur corpus, especially the construction of modern Uyghur corpus, and analyzes the modern Uyghur language corpus preprocessing technology, the modern Uyghur language corpus statistics technology, the modern Uyghur stem word extraction technology, the modern Uighur language Data analysis techniques were researched. The modern Uyghur language common words candidate list was developed and the words were investigated from the frequency of use of words and the distribution of words. The word number, frequency, frequency of Uyghur words , Text number, word length "as the basis for the common word candidate list.