第22卷第2期2023年2月Vol.22No.2Feb.2023软件导刊SoftwareGuide基于K-means聚类与粗糙集的个人信用集成分类模型张怡,谢晓金(上海工程技术大学数理与统计学院,上海201620)摘要:针对个人信用数据大多数据类型杂糅以及传统K-means聚类初始簇中心和个数难以确定的问题,提出一种改进的K-means聚类与粗糙集相结合的个人信用集成分类模型。首先,基于样本空间密度衡量样本点的聚集程度,以确定初始簇中心,并引入改进的自适应思想动态调整簇中心个数进行K-means聚类,从而实现对连续型数据的离散化;其次,运用粗糙集进行属性约简,获得特征子集;最后,结合代价敏感构建以L1-逻辑回归、弹性网-逻辑回归、贝叶斯、决策树和神经网络为基模型的集成模型,实现对个人信用数据的有效分类。实验结果表明,本文提出的集成分类模型在UCI数据集上,较已有模型的G-means平均提高约2.96%,最大提高约5.35%,F-value平均提高约3.42%,最大提高约6.83%。关键词:个人信用;K-means聚类;粗糙集;样本空间密度;自适应;不平衡数据DOI:10.11907/rjdk.221099开放科学(资源服务)标识码(OSID):中图分类号:TP181文献标识码:A文章编号:1672-7800(2023)002-0142-06PersonalCreditIntegrationClassificationModelBasedonK-meansClusteringandRoughSetZHANGYi,XIEXiao-jin(SchoolofMathematicsandStatistics,ShanghaiUniversityofEngineeringScience,Shanghai201620,China)Abstract:AnimprovedpersonalcreditintegrationclassificationmodelcombiningK-meansclusteringandroughsetwasproposedtosolvetheproblemthatmostpersonalcreditdatahavemixeddatatypesanditisdifficulttodeterminetheinitialclustercenterandnumberoftraditionalK-meansclustering.Firstly,theclusteringdegreeofsamplepointswasmeasuredbasedonthedensityofsamplespacetodeterminetheinitialclustercenters,andtheimprovedadaptiveideawasintroducedtodynamicallyadjustthenumberofclustercentersforK-meansclustering,soastorealizethediscretizationofcontinuousdata.Secondly,roughsetisusedforattributereductiontogetthefeaturesubset;Finally,aninte⁃gratedmodelbasedonL1-logisticregression,elasticnet-logisticregression,Bayes,decisiontreeandneuralnetworkisconstructedcombin⁃ingcostsensitivitytoachieveeffectiveclassificationofunbalancedpersonalc...