Inductive Model Generation for Text Classification Using a Bipartite Heterogeneous Network

来源 :计算机科学技术学报(英文版) | 被引量 : 0次 | 上传用户:qingsong009
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
Algorithms for numeric data classification have been applied for text classification. Usually the vector space model is used to represent text collections. The characteristics of this representation such as sparsity and high dimensionality sometimes impair the quality of general-purpose classifiers. Networks can be used to represent text collections, avoiding the high sparsity and allowing to model relationships among different objects that compose a text collection. Such network-based representations can improve the quality of the classification results. One of the simplest ways to represent textual collections by a network is through a bipartite heterogeneous network, which is composed of objects that represent the documents connected to objects that represent the terms. Heterogeneous bipartite networks do not require computation of similarities or relations among the objects and can be used to model any type of text collection. Due to the advantages of representing text collections through bipartite heterogeneous networks, in this article we present a text classifier which builds a classification model using the structure of a bipartite heterogeneous network. Such an algorithm, referred to as IMBHN (Inductive Model Based on Bipartite Heterogeneous Network), induces a classification model assigning weights to ob jects that represent the terms for each class of the text collection. An empirical evaluation using a large amount of text collections from different domains shows that the proposed IMBHN algorithm produces significantly better results than k-NN, C4.5, SVM, and Naive Bayes algorithms.
其他文献
It is known that latent semantic indexing (LSI) takes advantage of implicit higher-order (or latent) structure in the association of terms and documents. Higher
Outlier detection on data streams is an important task in data mining. The challenges become even larger when considering uncertain data. This paper studies the
A highly reproducible and efficient in vitro shoot regeneration system was developed in a potential medicinal plant, Albizia lebbeck using root explants. Root e
抓好建筑企业的安全管理,可以解决安全工作中存在的问题,提高安全工作质量水平,使企业的安全工作水平上升到一个新的高度,从而达到节约成本、提高生产率、增强企业竞争力的目
Steel mesh is used as a passive skin confinement medium to supplement the active support provided by rock bolts for roof and rib control in underground coal min
In this work, we studied the dimensions of stream tube in the vertical as well as inclined bank conditions. Data were co-llected from both a physical model and
We propose and analyze a spectral Jacobi-collocation approximation for frac-tional order integro-differential equations of Volterra type. The fractional derivat
Opportunistic routing(OR) involves multiple candidate forwarders to relay packets by taking advantage of the broadcast nature and multi-user diversity of the wi
The dispersion of particles emitted from the surface of a circular cylinder placed in a gas flow at the Reynolds number of 200 000 is numerically investigated u