论文部分内容阅读
介绍了一种描述能力介于线性词序列和完整句法树表示之间的浅层句法知识描述体系——组块分析体系,并详细讨论了其中两大部分:词界块和成分组的基本内容及其自动识别算法.在此基础上,提出了一种分阶段构造汉语树库的新设想,即先构造组块库,再构造树库,进行了一系列句法分析和知识获取实验,包括1)自动识别汉语最长名词短语;2)自动获取汉语句法知识等.所有这些工作都证明了这种知识描述体系的实用性和有效性.
This paper introduces a shallow description system of syntactic knowledge between linear word sequences and complete syntactic tree representations - a chunk-based analysis system, and discusses in detail two parts: the basic content of the word-bounding blocks and component groups And its automatic recognition algorithm. On the basis of this, a new idea of constructing Chinese treebanks in stages is proposed, that is, building blockbanks first, rebuilding treebanks and conducting a series of syntactic analysis and knowledge acquisition experiments, including 1) automatic recognition of Chinese longest Noun phrase; 2) automatic acquisition of Chinese syntax knowledge. All these work proves the practicality and effectiveness of this kind of knowledge description system.