信息与电脑2023年第10期Information&Computer计算机工程应用技术基于多特征选择的电力档案自动分类方法马宁李瑞环(国网浙江省电力有限公司嵊州市供电公司,浙江嵊州摘要:针对电力档案自动分类中应用效果不佳的问题,提出基于多特征选择的电力档案自动分类方法。首先,对电力档案文本内容进行提取、分词、去停词处理,并利用向量空间模型表示电力档案本文;其次,利用多特征选择技术提取文档频率、卡方检验、归一化差异、基尼指数及信息增益多项特征;最后,根据特征确定电力档案文档与类别的相似度,通过与分类阈值对比确定电力档案类别。实验结果表明,设计方法的档案错误分类数量较少,优于传统方法,在电力档案自动分类方面拥有广阔的应用前景。关键词:多特征选择;电力档案;自动分类中图分类号:TP391AutomaticClassificationMethodofPowerArchivesBasedonMultipleFeature312400)文献标识码:A文章编号:1003-9767(2023)10-019-03SelectionMANing,LIRuihuan(ShengzhouPowerSupplyCompanyofStateGridZhejiangElectricPowerCo.,Ltd.,ShengzhouZhejiang312400,China)Abstract:Amultifeatureselectionbasedautomaticclassificationmethodforpowerarchivesisproposedtoaddresstheissueofpoorapplicationperformanceinautomaticclassificationofpowerarchives.First,thetextcontentofpowerarchivesisextracted,wordsegmentation,stopwordremoval,andvectorspacemodelisusedtorepresentthepowerarchivestext.Secondly,multiplefeatureselectiontechniquesareusedtoextractmultiplefeaturessuchasdocumentfrequency,chisquaretest,normalizeddifference,Giniindex,andinformationgain.Finally,thesimilaritybetweenpowerarchivedocumentsandcategoriesisdeterminedbasedontheircharacteristics,andthepowerarchivecategoriesaredeterminedbycomparingthemwithclassificationthresholds.Theexperimentalresultsshowthatthedesignmethodhasasmallernumberoffilemisclassificationerrors,whichissuperiortotraditionalmethodsandhasbroadapplicationprospectsinautomaticclassificationofpowerfiles.Keywords:multi-featureselection;powerfile;automaticclassification0引言随着电力行业的不断发展,电力档案数量逐渐增多,划分的档案类别不断增多,电力档案精细化分类要求不断提高。国内关于这方面的研究起步比较晚,档案自动化分类理论还不够成熟...