论文部分内容阅读
分析数值决策表离散化方案的度量指标,包括断点数、条件信息熵、粒度熵、类-属性互信息、类-属性互相依赖冗余等.认为相容决策表的条件信息熵和类-属性互信息都是常数,对离散化方案不再有指导作用.讨论粒度熵与互相依赖冗余的关系,证明粒度熵随断点的加入而增加.设计实验度量这些指标之间的关系,实验发现,断点数和粒度熵与预测精度之间的相关程度不相上下,和具体的数据集有关.
This paper analyzes the metrics of discretization schemes in numerical decision table, including the number of breakpoints, conditional information entropy, granularity entropy, class-attribute mutual information, class-attribute dependency redundancy, etc. The conditional information entropy and class-attribute Mutual information is a constant and no longer instructive for the discretization scheme.We discuss the relationship between granularity entropy and interdependent redundancy and prove that the granularity entropy increases with the addition of breakpoints.Experimental experiments are designed to measure the relationship between these indicators, , The correlation between the number of breakpoints and granularity entropy and prediction accuracy is comparable with the specific data set.