研究与开发基于上下文信息与注意力特征的欺骗语音检测陈佳1,章坚武1,张浙亮2(1.杭州电子科技大学,浙江杭州310018;2.浙江宇视科技有限公司,浙江杭州310051)摘要:随着语音合成和语音转换技术的快速发展,欺骗语音检测方法仍存在欺骗检测准确率低、通用性差等问题。因此,提出一种基于上下文信息与注意力特征的端到端的欺骗检测方法。该方法基于深度残差收缩网络(DRSN),利用双分支上下文信息协调融合模块(DCCM)聚集丰富的上下文信息,融合基于协调时频注意力机制(CTFA)的特征以获得具有上下文信息的跨维度交互特征,从而最大化捕获伪影的潜力。与最佳基线系统相比,在ASVspoof2019LA数据集中,所提方法在EER和t-DCF性能指标上分别降低68%和65%;在ASVspoof2021LA数据集中,所提方法的EER和t-DCF分别为4.81和0.3115,分别降低48%和10%。实验结果表明,所提方法能有效提高欺骗语音检测的准确率和泛化能力。关键词:欺骗语音检测;上下文信息;注意力特征;端到端;伪影中图分类号:TN912.3文献标志码:Adoi:10.11959/j.issn.1000–0801.2023006SpoofspeechdetectionbasedoncontextinformationandattentionfeatureCHENJia1,ZHANGJianwu1,ZHANGZheliang21.HangzhouDianziUniversity,Hangzhou310018,China2.ZhejiangUniviewTechnologiesCo.,Ltd.,Hangzhou310051,ChinaAbstract:Withtherapiddevelopmentofspeechsynthesisandspeechconversiontechnology,methodsofspoofspeechdetectionstillhaveproblemssuchaslowspoofdetectionaccuracyandpoorgenerality.Therefore,anend-to-endspoofdetectionmethodbasedoncontextinformationandattentionfeaturewasproposed.Basedondeepresidualshrinkagenetwork(DRSN),theproposedmethodusedthedual-branchcontextinformationcoordinationfusionmodule(DCCM)toaggregaterichcontextinformation,andfusedfeaturesbasedoncoordinatetime-frequencyattention(CTFA)toobtaincross-dimensionalinteractionfeatureswithcontextinformation,thusmaximizingthepotentialofcapturingartifacts.Comparedwiththebestbaselinesystem,intheASVspoof2019LAdataset,theproposedmethodhadreducedtheEERandt-DCFperformanceindicatorsby68%and65%respectively,intheASVspoof2021LAdataset,theEERandt-DCFoftheproposedmethodwere4.81and0.3115anddroppedby48%and10%separately.Theexperimentalresultsshowthatthismethodca...