论文部分内容阅读
本文对两千万字的藏文语料做字频、音节频度的统计,以及字丁熵值、音节的相对熵值和绝对熵值的计算。统计结果表明(1)藏文标准音节5334个,其中单字音节475个,双字音节3061个,三字音节902个,四字音节896个;(2)藏文字丁或音节的频度分布极不均匀,覆盖统计文本90%、95%的音节分别是703个和1140个。
In this paper, the word frequency of two tens of millions of words of Tibetan corpus, syllable frequency statistics, as well as the word entropy, syllable relative entropy and absolute entropy value calculation. The statistical results show that: (1) there are 5334 Tibetan standard syllables, of which 475 are monosyllabic syllables, 3061 are double-syllables, 902 are three -syllable syllables and 896 are four-syllables. (2) The frequency distribution of Tibetan syllables or syllables Uneven, covering 90% of the statistical text, 95% of the syllables are 703 and 1140.