doi:10.3969/j.issn.1007-7375.2023.03.010基于两视图半监督学习的产品质量问题识别方法姚池1,2,潘尔顺1,2(上海交通大学1.机械与动力工程学院2.中国质量发展研究院,上海200240)摘要:针对电商网站中的大量非结构化、无标注的用户评论文本,运用两视图半监督学习方法对其进行分类,识别出涉及产品质量问题的内容,从而挖掘出其中隐含的产品质量缺陷与隐患。综合考虑词汇、情感、领域等多方面特征,构建文本特征视图和非文本特征视图,采用Co-training协同训练算法,依据是否涉及质量问题对评论进行分类。以电热水壶为例,爬取电商网站的评论数据进行实证分析。结果显示,本文方法的分类F1值和AUC值分别为82.18%和86.24%,相比于单视图监督学习分类器具有显著提升。关键词:评论分类;多视图学习;半监督学习;协同训练;质量问题识别中图分类号:TP393文献标志码:A文章编号:1007-7375(2023)03-0086-09IdentificationMethodofProductQualityProblemsBasedonTwo-viewSemi-supervisedLearningYAOChi1,2,PANErshun1,2(1.SchoolofMechanicalEngineering;2.ChineseInstituteforQualityResearch,ShanghaiJiaoTongUniversity,Shanghai200240,China)Abstract:Basedontheabundantunstructuredandunlabeledtextsofconsumerreviewsine-commercewebsites,atwo-viewsemi-supervisedlearningmethodisproposedtoclassifythereviewsandidentifythecontentrelatedtoproductqualityproblems,soastominethehiddenqualitydefectsanddangersofproducts.Comprehensivelyconsideringthecharacteristicsofvocabulary,emotion,domainandsoon,thetextviewandnon-textviewareconstructed,andtheCo-trainingcollaborativetrainingalgorithmisadoptedtoclassifythereviewsaccordingtowhetherqualityproblemsareinvolved.Takingtheelectrickettleasanexample,theconsumerreviewswerecrawledfromane-commercewebsiteforempiricalanalysis.ResultsshowthatF1scoreandAUCoftheproposedmethodare82.18%and86.24%,respectively,whichissignificantlyimprovedcomparedwiththesingleviewsupervisedlearningclassifier.Keywords:reviewsclassification;multi-viewlearning;semi-supervisedlearning;collaborativetraining;qualityproblemsidentification传统的产品质量问题发现渠道为出厂前的检测、测试以及产品出厂后的用户问卷调查、保险公司反馈等,存在成本高、反映迟缓和样本不足等缺点[...