基于语义相关的概念信息检索

来源 :北京理工大学 | 被引量 : 0次 | 上传用户:liongliong442
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
Traditional IR systems based on syntactic search are unable to address basicissues like synonymy or polysemy. To overcome these issues the semantic web hasbeen presented as major tool. It achieves this goal with an excellent precision.However a big part of the data remains unstructured. It cost a great effort tomanually annotate such big data. Concept based information retrieval is approachthat can manage directly unstructured data and address the issues. Our conceptbased method is constructed using semantic relatedness. The major research andcontributions are presented as follows:(1) In order to find a unique measure to compute the distance between queries,concepts and documents we have implemented a new semantic similarity measure.Similarity measures are the most important tools in information retrieval andnatural language processing. Sentence similarities are of capital importance inonline translation. Words-to-document similarity is the key factor to computequery relevance. Text similarities play a big role in data mining. A lot of similaritymeasures have been used in different domains. Most of the measures are corpusdependent or language dependent. When some measures are good to computeword-to-document matching, they are unable to compute document-to-documentsimilarity and vice versa. In this paper we present a method that can be used tocompute any kind of semantic similarity. The method is neither corpus dependentnor language dependent, and gives a way to compare more accurately semanticrelatedness. (2) Traditional search engines, based on words to documents matching, are knownto present extremely low precision. They do not address some key problems likesynonymy or polysemy. To deal with these issues semantic search has beenproposed. It certainly addresses the issues but presents a very low recall. It onlymanages structured data. Unstructured data need to be annotated first. Annotating ahuge unstructured data is time consuming. Concept based information retrievalproposes to extend syntactic search using words semantic relationships. In thiswork, we present a method using an undirected graph of concepts extracted fromWikipedia corpus in order to retrieve unstructured data. The main feature is basedon concept’s semantic relationships.(3) Concept based search is a method that enhances information retrieval systemsusing semantic relationships. The recall in concept based search is relatively low.That low recall comes from the fact that it is not easy to represent a conceptcompletely. Query expansion intends to fill a gap because concept representation isalways partial. Query expansion improves the recall. In this paper we present anexpansion method for a concept based information retrieval. Our method usessemantic relatedness to extend user query through an undirected graph of concepts.The concepts-to-concept relatedness is the only source of expansion used in thiswork.(4) Query expansion has a very high computation cost. This computation cost candecrease if we transfer at indexing what is usually done at search time. Clusteringis a way to organize data according to a given similarity. Traditional clusteringmethods are not able to describe the generated clusters. Conceptual clustering is an important and active research area that aims to efficiently cluster and explain thedata. Previous conceptual clustering approaches provide descriptions that do notuse a human comprehensible knowledge. In this work we presentan algorithm thatuses concepts to process a clustering method. The generated clusters overlap eachother and serve as a basis for an information retrieval system. The method has beenimplemented in order to improve the performance of the system by reducing thecomputation cost.
其他文献
无线体域网是由可感知人体多种生理参数的轻便、可穿戴或可植入的传感器节点构建的无线网络。无线体域网为人体健康监测提供了新的手段,在疾病监控、健康恢复、特殊人群监护
我国是一个地域辽阔、人口众多的国家,中国古建筑文化在世界中占据着很重要的地位.在历史的演变和民族的融合下,我国每一阶段的建筑风格都与当时的社会文化相关联,同时也体现
振动搅拌技术是一种通过释放激振力来强化搅拌过程的有效手段。相对于传统静力搅拌技术,振动搅拌技术在搅拌机理方面实现了重大突破,在静力搅拌的基础上,加入振动搅拌,提高了
目的:分析老年患者胸腔积液的临床特点,总结诊断胸腔积液类型。方法:对38例已明确诊断的老年胸腔积液患者的临床资料进行分析。结果:经病理及临床治疗经过证实,老年胸腔积液以恶
<正>病毒性结膜炎是一组传染性很强的常见眼病,其包括流行性角结膜炎、流行性出血性结膜炎、咽结膜热等,临床上最多见的类型为前两者。1989年夏秋季节,此病在我县暴发流行,19
回 回 产卜爹仇贱回——回 日E回。”。回祖 一回“。回干 肉果幻中 N_。NH lP7-ewwe--一”$ MN。W;- __._——————》 砧叫]们羽 制作:陈恬’#陈川个美食 Back to yield
会议
医学图像配准是医学图像融合的基础,已经广泛应用于医学临床疾病的诊断、治疗计划的制定、手术导航和疾病发展过程的研究中。继Collignon和Viola等人提出基于互信息的配准方
复杂网络社区挖掘是近十年最前沿的多学科交叉研究热点之一,已被广泛应用于恐怖组织识别、蛋白质功能预测、搜索引擎等诸多领域。本文基于蚁群算法、遗传算法、马尔科夫动力学
舞美作为戏剧舞台演出的重要组成部分,其渲染气氛的功能被人们所重视。就舞美设计而言,在整个戏曲创作中,除了在舞台上会跑会动的演员之外,凡是观众所能见到的物体,都是舞美
目的:探讨两种术式治疗肾囊肿的经济效果与合理诊治方法。方法:97例肾囊肿患者手术方式成本-效果分析比较。结果:A、B两组的治疗成本-效果增量比较,B组方案为较佳方案。结论:腹腔