论文部分内容阅读
As a generative model, Latent Dirichlet Allocation Model, which lacks optimization of topics discrimination capability focuses on how to generate data, This paper aims to improve the discrimination capability through unsupervised feature selection. Theoretical analysis shows that the discrimination capability of a topic is limited by the discrimination capability of its representative words. The discrimination capability of a word is approximated by the Information Gain of the word for topics, which is used to distinguish between general word and special word in LDA topics. Therefore, we add a constraint to the LDA objective function to let the general words only happen in general topics other than special topics. Then a heuristic algorithm is presented to get the solution. Experiments show that this method can not only improve the information gain of topics, but also make the topics easier to understand by human.