论文部分内容阅读
纠错输出编码是一种处理多类分类问题的有效方法,但它只能用于有监督的数据,而对大量未标签样本却无法利用.提出一种新颖的基于半监督技术的层次编码算法,对传统的纠错输出编码算法(ECOC)进行改造,拓展了编码的概念.在编码阶段,根据簇特征进行同类组合后再进行层次编码,从而在充分利用了无标签样本的同时,根据数据类分布的特点进行编码以提高算法精度.最后在化工产品有毒性预测数据集上的实验结果表明了本方法的可行性和有效性.
Error-correcting output coding is an effective method to deal with many kinds of classification problems, but it can only be used for supervised data but can not be used for a large number of unlabeled samples.A novel hierarchical coding algorithm based on semi-supervised techniques , The traditional error correction output coding algorithm (ECOC) is modified to expand the concept of coding.At the coding stage, according to the cluster characteristics, the same type of coding is used to encode the ECOCs, so that when taking full advantage of the unlabeled samples, Class distribution characteristics to improve the accuracy of the algorithm.Finally, the experimental results on the toxicity prediction data sets of chemical products show the feasibility and effectiveness of this method.