论文部分内容阅读
传统的文本分类方法在处理短文本分类任务时遇到了很大的困难,针对短文本分类任务上的数据稀疏等难点,本文尝试在短文本特征输入和卷积神经网络结构上进行改进.在特征表示Word embedding训练时采取non-static和static两种方式,将训练好的Word embedding进行聚类处理,聚类得到的Word embedding库作为模型输入的词典库;提出一种改进的双通道卷积神经网络结构,网络通过双通道获取更多的局部敏感信息增加特征数目,然后经过连续的池化实现特征抽取.经实验验证,提出的语义聚类处理和改进的网络模型与传统的机器学习方法相比,在短文本分类任务的准确率上有显著的提升.“,”In view of the difficulty of short text classification task,tried to improve on short text feature representation and convolution neural network structure.Above all,Word embedding training is taken in two ways:non-static and static.and the Word embedding is used as a model to input Word embedding clustering.Then an improved CNN structure,the network obtains more local sensitive information through two channels to improve the number of features.Experiments show that the improved semantic clustering approach and improved CNN model have a significant improvement on the accuracy of short text classification tasks compared with traditional machine learning methods.