计算机与现代化JISUANJIYUXIANDAIHUA2023年第1期总第329期摘要:针对深度学习算法在语音情感特征提取方面的不足以及识别准确率不高的问题,本文通过提取语音数据中有效的情感特征,并将特征进行多尺度拼接融合,构造语音情感特征,提高深度学习模型对特征的表现能力。传统递归神经网络无法解决语音情感识别长时依赖问题,本文采用双层LSTM模型来改进语音情感识别效果,提出一种混合多尺度卷积与双层LSTM模型相结合的模型。实验结果表明,在中科院自动化所汉语情感数据库(CASIA)和德国柏林情感公开数据集(Emo-DB)下,本文所提语音情感识别模型相较于其他情感识别模型在准确率方面有较大提高。关键词:语音情感识别;深度学习;神经网络;多尺度卷积;长短时序网络中图分类号:TP398.1文献标志码:ADOI:10.3969/j.issn.1006-2475.2023.01.011SpeechEmotionRecognitionofHybridMulti-scaleConvolutionCombinedwithDual-layerLSTMLIANGKe-jin,ZHANGHai-jun,LIUYa-qing,ZHANGYu,WANGYue-yang(CollegeofComputerScienceandTechnology,XinjiangNormalUniversity,Urumqi830054,China)Abstract:Aimingatthedeficienciesofdeeplearningalgorithmsintheextractionofspeechemotionfeaturesandthelowrecognitionaccuracy,theeffectiveemotionfeaturesinthespeechdataareextracted,andthefeaturesaresplicedandmergedatmultiplescalestoconstructspeechemotionfeaturesandimprovethedeeplearningmodel’sperformance.Traditionalrecurrentneuralnetworkscannotsolvethelong-termdependenceproblemofspeechemotionrecognition.Thedual-layerLSTMmodelisusedtoimprovetheeffectofspeechemotionrecognition,andamodelcombininghybridmulti-scaleconvolutionanddual-layerLSTMmodelisproposed.ExperimentalresultsshowthatundertheChineseEmotionDatabase(CASIA)oftheInstituteofAutomationoftheChineseAcademyofSciencesandtheBerlinEmotionOpenDataSet(Emo-DB),comparedwithotheremotionrecognitionmodels,thespeechemotionrecognitionmodelproposedinthisarticlehasagreatimprovementinaccuracy.Keywords:speechemotionrecognition;deeplearning;neuralnetwork;multi-scaleconvolution;longandshorttimeseriesnetwork文章编号:1006-2475(2023)01-0063-06混合多尺度卷积结合双层LSTM语音情感识别梁科晋,张海军,刘雅情,张昱,王月...