论文部分内容阅读
In this information age, databases are piling up huge volume data. For getting useful information from this "data sea", knowledge discovery in database (KDD) emerges as the most hot research field. The association rule-mining problem is one of the most studied and the most popular KDD tasks.The chief task of association rules mining is to find the frequent itemsets. The algorithms for finding frequent itemsets can be sort as three groups: 1. Levelwise algorithms. Apriori algorithm is a most typical algorithm. Other this kind of algorithms is Mannila, Pardon, DIG, and so on. The main idea of this kind of algorithms is to prune the sub-itemsets lattice. It is started from the 1 -size itemsets, passing the database level by level, and stopped when the largest frequent itemsets were found. It is a most popular method to find frequent itemsets. However, the perform time will increase in magnitude level, and the perform efficiency and effect won’t be very good. 2. Algorithms that the frequent itemsets are found by finding the largest frequent itemsets, for example, Pincer-Search algorithm, MaxClique algorithm, MaxMiner algorithm and so on. This kind of algorithms save performing time cost and space occupation. However, for the objections on its theoretical basement, it will lose information when generating assocaiton rules using the result. 3. Algorithms that finding frequent itemsets by discovering the frequent closed itemsets, i.e., algorithms based on formal concept analysis (FCA) and Concept Lattice. The main idea of this kind of algorithms is to find closed frequent itemset firstly, then get all frequent itemsets from the result. Because it transforms the problem of finding frequent itemset into the problem of finding frequent closed itemsets, this kind of algorithms reduces both space and time cost. Especially whenthe performing objects are dense and highly correlated databases, because the number of frequent closed itemsets is great less than the number of frequent itemsets, the performing effect will outgo the Apriori kind of algorithms, and the association rules can be found without any losing at same time. For these reason, association rules mining based on concept lattice is a very efficient mining method when the object database is a dense and highly correlated database.This paper firstly analyses the time and space complexity of several closed frequent itemsets mining algorithm based on concept lattice, then deduces the main factors deciding this kind of algorithms performing efficiency. When the database’s correlation degree is relatively low, the effect of association rules mining using concept lattice won’t be better than Apriori kind of algorithms, sometimes even worse than the Apriori kind of algorithms do. This paper develops a judging and selecting algorithm based on database’s correlation degree, whose name "is RelationDesider. It can get the information of the database’s correlation degree by an database passing before beginning association rules mining, and decide which kind of algorithm should be selected. At last, this paper introduces the association rules exact based on concept lattice, and discusses the difference exacting association rules method based on concept lattice and the common method of exacting association rules.