论文部分内容阅读
从数据中发现与一个变量有直接因果关系的其它变量是一种非常有价值的技术.本文针对回归分析中的逐步回归算法和贝叶斯网络学习中的SGS算法、PC算法应用于变量选择的不足,提出了一种新的因果关系发现算法STEPCARD,并将其与STEPWISE算法和SGS算法进行了实验比较分析.实验表明,STEPCARD算法能够和SGS算法一样从初始自变量集合中找出与因变量有因果相邻关系的变量,而STEPWISE算法只能找出与因变量显著相关的变量,其次,当初始自变量集合较大,而最后输出的自变量集合较小时,STEPCARD算法的计算量比SGS算法的计算量小得多.而且,当初始自变量个数接近或大于事例个数时,SGS算法将无法应用,而STEPCARD算法依然可以得到可信的结果.
It is a very valuable technique to find other variables that have a direct causal relationship with a variable from the data.In this paper, for the stepwise regression in regression analysis and SGS in Bayesian network learning, PC algorithm is applied to variable selection This paper proposes a new causal relationship discovery algorithm STEPCARD and compares it with STEPWISE algorithm and SGS algorithm.Experiments show that STEPCARD algorithm can find the same result as the SGS algorithm from the initial set of independent variables The STEPWISE algorithm can only find the variables that are significantly related to the dependent variable. Second, when the initial set of independent variables is large and the set of independent variables is the last output, the STEPCARD algorithm is more computationally intensive than the SGS However, when the number of initial arguments is close to or greater than the number of cases, the SGS algorithm can not be applied and the STEPCARD algorithm can still get credible results.