分类分析中基于信息论准则的特征选取(英文)

来源 :自动化学报 | 被引量 : 0次 | 上传用户:wskiqpk
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
Feature selection aims to reduce the dimensionality of patterns for classificatory analysis by selecting the most informative instead of irrelevant and/or redundant features.In this study,two novel information-theoretic measures for feature ranking are presented:one is an improved formula to estimate the conditional mutual information between the candidate feature f_i and the target class C given the subset of selected features S,i.e.,I(C;f_i|S),under the assumption that information of features is distributed uniformly;the other is a mutual information(MI)based constructive criterion that is able to capture both irrelevant and redundant input features under arbitrary distributions of information of features.With these two measures,two new feature selection algorithms, called the quadratic MI-based feature selection(QMIFS)approach and the MI-based constructive criterion(MICC)approach, respectively,are proposed,in which no parameters likeβin Battiti’s MIFS and(Kwak and Choi)’s MIFS-U methods need to be preset.Thus,the intractable problem of how to choose an appropriate value forβto do the tradeoff between the relevance to the target classes and the redundancy with the already-selected features is avoided completely.Experimental results demonstrate the good performances of QMIFS and MICC on both synthetic and benchmark data sets. Feature selection aims to reduce the dimensionality of patterns for classificatory analysis by selecting the most informative instead of irrelevant and / or redundant features. In this study, two novel information-theoretic measures for feature ranking are presented: one is an improved formula to estimate the conditional mutual information between the candidate feature f_i and the target class C given the subset of selected features S, ie, I (C; f_i | S), under the assumption that information of features is distributed uniformly; the other is a mutual information MI) based constructive criterion that is able to capture both irrelevant and redundant input features under arbitrary distributions of information of features. Two these new features selection algorithms, called the quadratic MI-based feature selection (QMIFS) approach and the MI -based constructive criterion (MICC) approach, respectively, are proposed, in which no parameters like βin Battiti’s MIFS and (Kwak and Choi) ’s MIFS-U me thods need to be preset.Thus, the intractable problem of how to choose an appropriate value forβto do the tradeoff between the relevance to the target classes and the redundancy with the already-selected features is ever completely. EXPERIMENTAL RESULTS demonstrate the good performances of QMIFS and MICC on both synthetic and benchmark data sets.
其他文献
东巴文是一种早期文字,呈现出的是成熟文字之前的状态,这种状态在东巴经书中有着生动、清晰的展示。从文字的造字方法来看,东巴经书中可见到一些特殊的造字方法。传统“六书
教师口语表达的优劣不仅表现着教学水平,并且直接影响到教学质量。加强口语表达训练是六年制免费师范生中职阶段的重要任务。本文从教师口语教学中发现的问题入手,从技术与艺
“网络新闻编辑”是一个前沿性课题,是我国媒体面临“入世”亟待熟悉的课题,本刊分四期发表该文,以飨读者。第一部分研究维度与理论基点研究维度传统意义上的编辑研究一般关
《文心雕龙》是我国有关文学理论的一本皇皇巨著,其体系严密、论述细致,为我国的文学研究贡献了不可估量的价值。其中,第四十八篇《知音》篇介绍了文学批评的相关问题,刘勰旁
水系字是哥巴文中比较具有代表性的一组字符,值得深入研究。我们把哥巴文水系字定义为:在■?i~(31)“水”这个字符的基础上通过增形或变形而成;其意义都与水相关的一批哥巴文
我们目前正处在一个飞速发展的时代,大量的新知识新思维充斥在我们周围,因此在学习的过程中我们不能再固步自封,在学习知识的同时我们要逐步提升自己知识运用的能力,学会利用
电视、网络、报纸和广播四大媒体的诸侯割据似乎愈演愈烈,在此消彼长的声浪中进行着火药味甚浓的眼球大战。所以,我们期望在不加任何主观臆测的基础上对四大媒体的需求份额做出
随着社会主义市场经济的逐步确立,带来的是一场深刻而广泛的社会变革,文化格局也随之产生了巨大的变化。上世纪90年代以来,基本形成了主导文化、精英文化和大众文化三足鼎力
在多属性决策领域中,经典评价指标客观赋权的极大熵模型主要依据指标评价值的差异度进行赋权,极易导致对评价值差异度小的指标作出不重要(权值小)的错误判断,从而产生决策偏
英国皮克罗斯有限公司卡多克斯分公司(The Cardox Division of Pikrose and Compang Ltd of the UK)采用一种无炸药爆破的方法。主要设施是一根内装有压二氧化碳的钢管。爆