论文部分内容阅读
一、语料库的基本知识语料库通常是指为语言研究收集的、用电子形式保存的语言材料,由自然出现的书面语或口语的样本汇集而成,用来代表特定的语言或语言变体。语料库已经成为语言学理论研究、应用研究和语言工程不可缺少的基础资源,这是因为经过科学选材和标注、具有适当规模的语料库能够反映语言的实际使用情况,人们可以通过语料库观察和把握语言事实,分析和研究语言系统的规律。语料库是以电子计算机为载体承载语言知识的基础资
First, the basic knowledge of the corpus Corpus usually refers to the linguistic research collected, electronically stored in the language of material, from the appearance of written or spoken samples from the collection, used to represent a specific language or language variant. Corpus has become an indispensable basic resource for linguistic theory research, applied research and language engineering. This is because, after scientifically selecting and annotating, corpora with appropriate scale can reflect the actual usage of language and people can observe and grasp language fact through corpus , Analyze and study the laws of the language system. Corpus is based on the electronic computer as a carrier of language knowledge base