论文部分内容阅读
With more and more knowledge provided by WWW, querying and mining the knowledge bases have attracted much research attention. Among all the queries over knowledge bases, which are usually modelled as graphs, a keyword query is the most widely used one. Although the problem of keyword query over graphs has been deeply studied for years, knowledge bases, as special error-tolerant graphs, lead to the results of the traditional defined keyword queries out of users’ satisfaction. Thus, in this paper, we define a new keyword query, called confident r-clique, specific for knowledge bases based on the r-clique definition for keyword query on general graphs, which has been proved to be the best one. However, as we prove in the paper, finding the confident r-cliques is #P-hard. We propose a filtering-and-verification framework to improve the search e?ciency. In the filtering phase, we develop the tightest upper bound of the confident r-clique, and design an index together with its search algorithm, which suits the large scale of knowledge bases well. In the verification phase, we develop an e?cient sampling method to verify the final answers from the candidates remaining in the filtering phase. Extensive experiments demonstrate that the results derived from our new definition satisfy the users’ requirement better compared with the traditional r-clique definition, and our algorithms are e?cient.