Research on identity recognition of English mail author based on writing style

来源 :大东方 | 被引量 : 0次 | 上传用户:qqq1234qqqqqqq
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
  Abstract:The content of the email is often very short,but the style of language is obvious.Therefore,we think the ideal in the sample case,part of the text style can be used to identify the author of the text.We use a short word mail in proportion,word species accounted for ratio,the average length of words,the mean and variance of lexical density and the maximum number of single use ratio as characteristic value,principal component analysis of these features,the final extract two principal components,which reflect the word density and vocabulary does not repeat,and then to the two principal components were used as independent variables and the dependent variables,the authors make different scatter diagram,found that these scattered point map has certain rules,can reflect the differences between the various authors,so we use the BP neural network model identification,to extract principal components as input features,with a four bit binary number As the author’s number,each author selects a certain number of mail to train.We find that when the learning rate is 0.01 and the hidden layer is 50,the test output is the best,and the correct rate of identification is 87.5%.
  Key words:text feature;principal component analysis;scatter diagram;BP neural network pattern identification   identification
  I.Problem Analysis and Model Establishment
  1.1 SPSS principal component analysis
  The eigenvalues of the extracted are input into the SPSS,and the principal component analysis is used to reduce the dimension of the feature set.
  It can be seen intuitively that there is a correlation between the variables,but it needs to be tested,and then the output is the correlation test:After the Bartlett sphericity test,the P value <0.001.combines two indexes,which shows the correlation between the variables,and can be analyzed by factor.we can see that the eigenvalues of components 1 and 2 are greater than 1,and they can explain 79.773% variance,which is pretty good.Therefore,we can extract 1 and 2 as principal components,and seize the main contradiction.
  The eight picture the abscissa represents 2 main components,namely “the average sentence length recognition ability of the author”;the ordinate represents the principal component 1,namely “the proportion of total words for identifying the author through different words ability;relationship between each figure represent each author of the two kinds of ability;through SPSS we can see that these two kinds of ability of each author has some relations and differences obviously.Therefore,we can put these two components as input parameters of BP neural network training,and then identify the authors of the text.   1.2 The solution of neural network
  We have two main components extracted as the input of neural network,as a four bit binary number to express the author’s name was S,so the choice of logarithmic function as the transfer function of output neurons.Through repeated testing,to determine the learning rate is 0.01,the maximum number of iterations for 10000 times,the hidden layer 50 layer.
  After executing a large number of neural network algorithms,we found that among the eight selected authors,seven were basically identified.The accuracy rate reached 87.5%.We could think that this model could identify the author of the mail.We chose two distributed scatter diagrams as follows:
  II.Conclusions
  The lexical structure out of the model can reflect the characteristics of different authors in a certain extent,this paper proposes the method of vocabulary and structure established identification based on the identity of the mail author is effective.Through principal component analysis,plot analysis,we conclude that the lexical features we selected can be used to different authors,the recognition rate can reach 87.5%.in the process of training the BP neural network,we found that for the final accuracy of the test result the greatest impact is the number of hidden layers,visible and hidden layers is determined accurately BP neural network training is the key factor,followed by BP network learning rate will affect the learning effect.
  III.References
  [1]RuiHua Qi.Research on the identification of text authors[M].Beijing:Tsinghua University press,2017;
  [2]Shuying Zhang、Ye Zhang.Implementation of pattern recognition and intelligent computing -Matlab Technology[M].Beijing:Electronic Industry Press,2015:138-191;
  [3]G.U.Yule,The statistical study of literary vocabulary, Cambridge University Press,(1944);
  [4]J.Moody and J.Utans, Architecture Selection Strategies for Neural Networks Application to Corporate Bond Rating, Neural Networks in the Capital Markets, (1995);
  (作者單位:山东理工大学)
其他文献
摘 要:随着我国高等教育的不断发展,对于我国大学体育教育也进入探讨阶段,将大学学生培养成适应社会需要和综合素质人才是当前大学教育的重要目标。在落实大学教育政策的过程之中,我国许多老师要站在学生发展的角度,立足于学生长期发展的现实条件以及具体的需求,不断的突破传统应试教育的桎梏,更好的实现学生的全方位成长以及发展。大学体育教育对培养学生良好的身体素质意义重大。本文笔者结合我国大学体育教学的具体情况,
期刊
摘 要:研究生创新能力的提升对于国家科学技术的进步至关重要,科研是引导研究生创新能力提高的重要手段。主要讨论科研对提高研究生创新能力的重要意义。  关键词:创新;科研;研究生  引言  创新是一个民族进步的灵魂。人们对生活品质要求的不断提高引领着科学技术的进步,而科学技术的进步源于不断地创新,因此创新成为了决定社会进步的重要因素。创新意识的培养对于创新能力的提高至关重要,这需要有效教育方式的引导。
期刊
近年来,新课标正在不断地实施,其中对于初中语文的学习指出学生应当是语文学习的主体,这就强调了在进行语文学习的时候,学生应当去进行自我的学习。首先在上课前应当进行预习活动,预习能够起到很大的作用,让语文学习能够更加的高效,尤其是以学生为主体去进行学习的时候,就会有更好的效果。  一、指导初中语文预习方法  在进行语文自主预习时,学生有时候会存在着不知道该怎样去进行课前的预习,在进行预习的时候没有一个
期刊
摘 要:档案是历史的记忆和知识的宝库,是人类文明发展的重要见证,是党和国家的宝贵财富。档案工作烦琐枯燥,但对工作的开展极为重要。因此我们要高度重视档案工作,并逐步提高档案意识。在工作中,深入贯彻落实科学发展观,充分发挥单位档案的信息功能和文化功能并增强档案意识。  关键词:档案意识;档案工作  所谓档案意识,是指人们对于档案和档案工作在国民经济、国家和社会生活各个方面所起作用、所处地位的认识。增强
期刊
摘 要:受益于党中央和国务院对加快创新发展、推进新型城镇化建设,继而全面建成小康社会的战略方针,智慧社会环境下智慧型城市的实践在各地大规模展开并取得卓越成效。作者立足于智慧社会环境下的档案工作为中心论点,分析智慧社会与智慧城市的特点与构成、管理上的重构、持续创新并就如何在智慧社会环境下的档案工作给出合理化建议。  1、智慧社会的内涵  1.1智慧社会  2017年10月,举世瞩目的党的十九大在京隆
期刊
摘 要:当今时代,幼儿礼仪习惯培养并贯穿于幼儿园一日活动教育中。在日常活动中,确保幼儿获得更好的体验与理解,提高接受能力,潜移默化中转为一种自觉情感态度与行为模式,为良好礼仪习惯的形成提供保障。基于此,本文以幼儿礼仪习惯培养途径为主,进行了探索。  关键词:幼儿;礼仪习惯;培养途径  引言  幼儿良好礼仪习惯的培养,与社会、幼儿园及幼儿家庭的配合密切相关,良好氛围中,加强幼儿练习与实践,以此确保礼
期刊
摘 要:本文通过分析《BIM技术的应用》课程在超星学习通平台的建设过程和在具体教学实践中得出的学情反馈数据,提出对在线课程搭建的几点建议。  关键词:BIM;超星学习通;教学实践;在线课程  近五年来,在国家政策引导下,职业教育的发展不断往前推进。相较于普通高校的素质教育,职业院校切实落地以市场需求为导向,为社会传输培养应用型的高水平技能人才。针对职业院校的这一定位,各大院校通过论证有效地结合职教
期刊
摘 要:本文以10组单音节依存反义动词为研究对象来探讨反义词与动词结合起来使用的情况,通过语料标注、统计和分析其句法语义功能特点,在此基础上对单音节依存反义动词的语义功能进行范畴化归类总结。本文发现单音节依存反义动词组内的两个分别表示“给予义”、“获得义”的反义动词在语义功能上具有不对称性。  关键词 单音节依存反义动词;语义功能;不对称  汉语庞大的动词体系中有一类词通过揭示事物内部矛盾,形成鲜
期刊
摘 要:课堂提问是一个综合性的问题、兴趣、教学艺术思想。有效的课堂发问是老师在领略了教学目的和教学内容,依据学情剖析,精心预设提问,经过情境创设,在教学中生成适当的问题,引发学生不断思考,使思维能力不断发展,促成教学目标的完成。在小学的英语课堂当中,教师有效的课堂发问能够启示小学生积极思索,开拓小学生的英语思维,引导小学生对新知识的探索,有利于整个课堂质量的提升。对提问的方式和内容进行积极的探索,
期刊
摘 要:在企事业的发展中,财务档案起着非常重要的作用,因为财务档案是通过相关凭证和账目等的记载,记录和反映了企事业在某一期间内的所有经济活动。建立良好健康的财务档案管理工作,有利于企事業单位的后续发展和管理水平的提高。我国财务档案的管理,呈现着比较稳定的发展态势,近些年,企事业单位的财务档案管理工作在不断发展的历程中,总结了相关经验,取得了很大的进步,但是,还存在着一些问题制约了财务档案管理的发展
期刊