论文部分内容阅读
本文研究对象为汉语中较为复杂的兼语结构,通过对大规模语料中兼语结构的分布观察,进行其内部语言特征及外部语言特征分析;一是从定量角度对其结构研究提供数据支持,二是为计算机自动识别提供语言学支持。基于这些特征建立模板,从而构建条件随机场模型,对兼语结构进行自动识别。在开放训练中,F值最高可达85.71%,这个结果表明条件随机场在兼语结构识别中已经接近应用水平,可以作为兼语结构识别的有效方法。
The research object of this thesis is the more complicated Chinese-language concurrent structure. Through the observation of the distribution of Chinese-English concurrent structures in large-scale corpus, the internal language features and external language features are analyzed. The first is to provide the data support for the structure research from quantitative perspective, The second is to provide linguistic support for computer automatic identification. Based on these characteristics, a template is established to construct the conditional random field model, which can automatically recognize the concurrent structure. In open training, the F value can reach as high as 85.71%. This result shows that the conditional random field is close to the application level in the CJ structure recognition, which can be used as an effective method for CJ structure recognition.