论文部分内容阅读
标注可以为语料库带来增值(added value)(Leech 1997),这一思想已经逐渐成为语料库语言学界的共识,因而标注语料库也逐渐成为大型语料库最基本的规范之一。在外语教学与研究中,我们常常可以利用功能强大的正则表达式(regularexpressions)对词性赋码语料库进行检索并从中提取各种所需信息。然而,由于正则表达式中所使用的各种符号有别于自然语言中的词语,对于绝大部分从事语言教学、语言学习和语言研究的人来说不无难度,又由于检索是语料库操作中最重要的环节之一,如何有效使用正则表达式对语料库进行检索成为语料库教学和研究中的难题之一。本文分析了词性赋码语料检索的基本特点,介绍了一种由研究者自行设计的针对词性赋码语料库检索的正则表达式编辑环境,并对如何使用这一环境编写正则表达式在词性赋码语料库中检索进行了探讨。
Labeling can bring added value to the corpus (Leech 1997). This idea has gradually become the consensus of the corpus linguistics community. Therefore, annotation corpus has gradually become one of the most basic norms of large corpus. In foreign language teaching and research, we often use the powerful regularexpressions to search the part-of-speech corpus and extract various kinds of information from it. However, since the symbols used in regular expressions are different from the words in natural language, it is not difficult for most people engaged in language teaching, language learning and language research, and because retrieval is a corpus operation One of the most important aspects of how to effectively use regular expressions to retrieve corpus becomes one of the problems in corpus teaching and research. This paper analyzes the basic characteristics of the corpus-based retrieval of part-of-speech coding, introduces a regular expression editing environment for the part-of-speech code corpus retrieval designed by the researcher, and introduces how to write the regular expression in POS Retrieval in corpus is explored.