Deep Convolutional Neural Network with Double LSTM Layer for Robust Speech Emotion Recognition

来源 :北京邮电大学 | 被引量 : 0次 | 上传用户:lwh_bbs
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
Speech emotion recognition(SER)is taking great attention due to artificial intelligence’s impact on the research arena.It has revealed various fields of research in computer science aiming to improve the fashion on the interactions of humans and machines.One of the significant focuses in the recent research arena is the recognition of human emotions by machines.Therefore,this type of communication is associated with information sciences.Different techniques are being adopted for detecting emotional states in vocal expressions,native speaker recognition,etc.Thus,the SER aims to recognize the correct emotional state of a speaker.Emotions have fuzzy temporal boundaries.Its difficulties arise in various ways.Emotions are expressed differently for each human,and one utterance may contain more than one emotion.The large gap between accuracy rates achieved on the different types of datasets of speech raises questions about the way emotions modulate the speech.Aiming to highlight the challenges of emotion in speech,the architecture,and some of the key layers of SER are presented in this thesis.Furthermore,the main focus on seeking informative features for emotion classes use the deep neural network(DNN)and recurrent neural network(RNN).Both DNN and RNN have been used as a key solution to recognize models for speech emotion recognition.Moreover,various combinations of the CNN and RNN are proposed in the SER field.However,the convolutional recurrent neural network(CRNN)is demonstrated as robust architecture in the last decades.The CRNN is the combination of the convolutional neural network(CNN)and recurrent neural network(RNN).Subsequently,the pros and cons of each layer of the CRNN architecture are also demonstrated.Therefore,this thesis focuses on robust SER with the aid of deep CRNN along with the double LSTM layers also called stacked LSTM.The main objective of this thesis is to design and implement a robust architecture for emotion recognition in speech.Thus,by adopting deep learning techniques along with CRNN,a DCRNN architecture is designed and implemented for the robust SER.The DCRNN architecture includes the log-Mel spectrogram,CNN layer,double LSTM(i.e.,stacked LSTM),fully connected layer,and the softmax layer.The log-Mel spectrogram extracts speech signals by extending the given dimensionality of the spectrum.It uses linear frequency scaling,so each frequency bins are spaced as an equal number of Hertz apart.The primary investigation is done in the CNN layer,where the regression steps are trained to obtain the optimal kernel size and max pooling for the extracted speech features.Then the recurrent neural network(RNN)with two layers LSTM is adapted.LSTM can avoid long-term dependency problems and it does not vanish the gradient when trained with backpropagation through time for presenting the effective model.The two layers of LSTM have hidden layers within the neural network as an extractor for more high-level features.The primary reason for adopting the two layers of LSTM is to extract speech features with more accuracy compared with traditional LSTM.Afterwards,the fully connected layer offers learning features from all the combinations of features of the previous layer.The softmax layer responds to the probability of input of the network.As it is demonstrated in the experimental results,the DCRNN architecture outperforms the traditional learning architectures for the SER.
其他文献
期刊
[内容简介]rn很多人还没有意识到:变老并不意味着衰退,衰退的决定权掌握在你自己手里.坚持运动,你就可以拒绝衰退,一直过着健康且精力充沛的生活.书中基于进化生物学的关于衰老的认知革命,不断被科学研究所证明, 数以百万计的大脑向身体发出生长信号,逆转生物时钟.
期刊
有关脊柱病的记载,最早可以追溯至埃及的古老外科文献“艾德温·史密斯莎草纸手稿”.这个文献其实是一本有些“狭隘”的外科病例集,其中记载了48例因坠落或战争而受伤的患者,多例出现了不同程度的脊柱损伤.手稿中对不同脊柱损伤的诊疗方式也记载得十分明确.可见,早在公元前1600─1700年,脊柱损伤虽多因外伤而导致,但脊柱的重要性已经引起了人们的重视.
〇 药品有哪些剂型rn所谓剂型,是根据病情与药物特点所制成的不同的药品形态.剂型不同,意味着药品的服用方式各不相同.由于胰岛素在胃中容易遭到破坏,所以胰岛素类的药物就不能制成口服制剂,而要制成针剂使用;有些药物如果遇到空气、光线或水分容易发生分解、变质,从而导致药效降低,甚至失效,因此需要制成片剂、丸剂,甚至还需要制作包衣.
期刊
如今,中老年人的健康意识越来越强烈,越来越多人养成了每天运动的好习惯.除跳广场舞、散步、登山等活动,甚至有很多人愿意像年轻人一样在健身房“撸铁”,中老年人的运动方式远比我们想象的更加丰富.“运动”一词在体育学中指的是“健身运动”和“体育锻炼”,它与竞技运动的区别很大,它不要求进行比赛,不需要和对手们竞技,而是追求自我效益,完善自我的体适能、健康和全面健康生活.
期刊
中医典籍《黄帝内经》说:“正气存内,邪不可干”.意思是,在一般情况下,人体正气旺盛,足以抗御邪气的侵袭,即使受到邪气的侵犯,也能及时消除其不利影响,因此不会发生疾病.反之,当人体正气不足,即正气相对虚弱,无力抗御邪气侵袭,又不能及时消除其造成的不利影响时,便会导致人体物质结构的损伤和功能活动的紊乱,发生疾病.即《黄帝内经》所说的“邪之所凑,其气必虚”.
期刊
The population of planet Earth is growing,and with it,and even faster,the number of devices connected to the Internet is increasing.High-quality video,real-time transmission of high-quality video,fina
学位
▎欣然的故事rn欣然是一个极度害怕犯错的人.rn在工作中,如果有人指出她的问题,她就会如临大敌;如果问题很严重,她就默默地改掉;如果问题不是很影响大局,可改也可不改,她就会跟对方展开一场辩论.但是在这之后,欣然又会觉得自己错了,她心里在想,同事一片好心,自己却过于计较,这太小心眼了.
期刊
饮食护理rn老年脑梗死患者多伴有“三高”问题,饮食须清淡,低脂、低糖、低胆固醇.若连续长期进食高脂肪、高热量食物,可使血脂进一步增高,血液黏稠度增加,动脉粥样硬化斑块容易形成,最终导致脑梗死复发.
在中国传统文化里,“脊梁”一词常用来代指人的意志、胆量和操守,也用来比喻中坚骨干力量,比如鲁迅就曾在文章中赞赏中国的有志之士为“中国的脊梁”.