Sentence segmentation for classical Chinese based on LSTM with radical embedding

来源 :中国邮电高校学报(英文版) | 被引量 : 0次 | 上传用户:amoyzhu
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
A low-than character feature embedding called radical embedding is proposed,and applied on a long-short term memory (LSTM) model for sentence segmentation of pre-modern Chinese texts.The dataset includes over 150 classical Chinese books from 3 different dynasties and contains different literary styles.LSTM-conditional random fields (LSTM-CRF) model is a state-of-the-art method for the sequence labeling problem.This model adds a component of radical embedding,which leads to improved performances.Experimental results based on the aforementioned Chinese books demonstrate better accuracy than earlier methods on sentence segmentation,especial in Tang\'s epitaph texts (achieving an F1-score of 81.34%).
其他文献
The bionics-based swarm intelligence optimization algorithm is a typical natural heuristic algorithm whose goal is to find the global optimal solution of the optimization problem.It simulates the group behavior of various animals and uses the information
A hybrid model for broadband multiple-input multiple-output (MIMO) relay-aided indoor power line communications (PLC) system was proposed in this paper.The proposed model combines the top-down and bottom-up approaches and extends to a two-hop relay-aided
In order to improve the learning speed and reduce computational complexity of twin support vector hypersphere (TSVH),this paper presents a smoothed twin support vector hypersphere (STSVH) based on the smoothing technique.STSVH can generate two hypersphere
In this paper,a power allocation to maximize tradeoff between spectrum efficiency (SE) and energy efficiency (EE) is considered for the downlink non-orthogonal multiple access (NOMA) system with arbitrarily clusters and arbitrarily users,where the subcarr
In case of machine learning,the problem of class imbalance is always troubling,i.e.one class of the samples has a larger magnitude than the other classes.This problem brings a preference of the classifier to the majority class,which leads to worse perform
Traditional methods for removing ocular artifacts (OAs) from electroencephalography (EEG) signals often involve a large number of EEG electrodes or require electrooculogram (EOG) as the reference,these constraints make subjects uncomfortable during the ac
A novel adaptively iterative list decoding (ILD) approach using for Reed-Solomon (RS) codes was investigated.The proposed scheme is exploited to reduce the complexity of RS Chase algorithm (CA) via an iterative decoding attempt mode.In each decoding attem
A new semi-serial fusion method of multiple feature based on learning using privileged information (LUPI) model was put forward.The exploitation of LUPI paradigm permits the improvement of the learning accuracy and its stability,by additional information
In the post quantum era,public key cryptographic scheme based on lattice is considered to be the most promising cryptosystem that can resist quantum computer attacks.However,there are still few efficient key agreement protocols based on lattice up to now.
Modeling and matching texts is a critical issue in natural language processing (NLP) tasks.In order to improve the accuracy of text matching,multi-granularities capture matching features (MG-CMF) model was proposed.The proposed model used convolution oper