论文部分内容阅读
在案例推理技术中,属性集合表征了对系统有影响的各种因子的集合,属性的选择和约简成为决定系统性能的关键因素。该文在分析属性约简技术的基础上,研究了基于熵的两种属性选择策略,即信息增益法和增益比率法,用层次化k-fo ld交叉验证和k-近邻(k-NN)相结合的技术,设计了5种方案,分别从不同角度来考察两种属性选择策略对案例分类性能的影响。实验结果表明,基于熵的属性选择策略能找到一个充分分离案例类别的属性子集,改善属性的表示空间。
In case-based reasoning, attribute sets represent a collection of various factors that affect the system. Attribute selection and reduction become the key factors that determine system performance. Based on the analysis of attribute reduction techniques, this paper studies two entropy-based attribute selection strategies, namely, information gain method and gain ratio method, using hierarchical k-fo ld cross-validation and k-nearest neighbor (k-NN) Combined with the technology, designed five kinds of programs, respectively, from different perspectives to examine two kinds of attribute selection strategy on the case classification performance. The experimental results show that the entropy-based attribute selection strategy can find a subset of attributes which can be separated sufficiently and improve the representation space of attributes.