计算机与现代化JISUANJIYUXIANDAIHUA2023年第1期总第329期摘要:为避免实体与关系独立抽取产生的误差累计及实体重叠问题,提出一种基于BERT和非自回归的联合抽取模型来进行医疗知识抽取。首先,通过BERT预训练语言模型进行句子编码;然后,采用非自回归(NAR,Non-autoregressive)的方法实现并行解码,抽取关系类型,并根据头尾实体的位置索引抽取实体,得到医疗实体的关系三元组;最后,将抽取出的实体和关系导入Neo4j图数据库中实现知识可视化。通过对电子病历中的数据进行人工标注得到数据集,实验结果表明,基于BERT和非自回归联合学习模型的F1值为0.92,precision值为0.93,recall值为0.92,与现有模型相比3项评价指标均有提升,表明本文方法能够有效抽取电子病历中的医疗知识。关键词:联合学习;非自回归;BERT;实体重叠;电子病历中图分类号:TP391.1文献标志码:ADOI:10.3969/j.issn.1006-2475.2023.020MedicalKnowledgeExtractionBasedonBERTandNon-autoregressiveYUQing,MAZhi-long,XUChun(SchoolofInformationManagement,XinjiangUniversityofFinanceandEconomics,Urumqi830012,China)Abstract:Inordertoavoidtheproblemsoferroraccumulationandentityoverlapcausedbythepipelineentityrelationextractionmodel,ajointextractionmodelbasedonBERTandNon-autoregressiveisestablishedformedicalknowledgeextraction.Firstly,withthehelpoftheBERTpre-trainedlanguagemodel,thesentencecodeisobtained.Secondly,theNon-autoregressivemethodisproposedtoachieveparalleldecoding,extracttherelationshiptype,extractentitiesaccordingtotheindexofthesubjectandobjectentities,andobtainthemedicaltriplet.Finally,weimporttheextractedtriplesintotheNeo4jgraphdatabaseandrealizeknowledgevisualization.Thedatasetisderivedfrommanuallabelingofdatainelectronicmedicalrecords.TheexperimentalresultsshowthattheF1value,precisionandrecallbasedonBERTandnon-autoregressivejointlearningmodelare0.92,0.93and0.92,respectively.Comparedwiththeexistingmodel,thethreeevaluationindicatorshavebeenimproved,indicatingthattheproposedmethodcaneffectivelyextractmedicalknowledgefromelectronicmedicalrecords.Keywords:jointlearning;non-autoregressive;BERT;entityoverlap;electronicmedicalrecord...