1DataMining:ConceptsandTechniques(3rded.)—Chapter8—JiaweiHan,MichelineKamber,andJianPeiUniversityofIllinoisatUrbana-Champaign&SimonFraserUniversity©2011Han,Kamber&Pei.Allrightsreserved.3Chapter8.Classification:BasicConceptsClassification:BasicConceptsDecisionTreeInductionBayesClassificationMethodsRule-BasedClassificationModelEvaluationandSelectionTechniquestoImproveClassificationAccuracy:EnsembleMethodsSummary4Supervisedvs.UnsupervisedLearningSupervisedlearning(classification)Supervision:Thetrainingdata(observations,measurements,etc.)areaccompaniedbylabelsindicatingtheclassoftheobservationsNewdataisclassifiedbasedonthetrainingsetUnsupervisedlearning(clustering)TheclasslabelsoftrainingdataisunknownGivenasetofmeasurements,observations,etc.withtheaimofestablishingtheexistenceofclassesorclustersinthedata5Classificationpredictscategoricalclasslabels(discreteornominal)classifiesdata(constructsamodel)basedonthetrainingsetandthevalues(classlabels)inaclassifyingattributeandusesitinclassifyingnewdataNumericPredictionmodelscontinuous-valuedfunctions,i.e.,predictsunknownormissingvaluesTypicalapplicationsCredit/loanapproval:Medicaldiagnosis:ifatumoriscancerousorbenignFrauddetection:ifatransactionisfraudulentWebpagecategorization:whichcategoryitisPredictionProblems:Classificationvs.NumericPrediction6Classification—ATwo-StepProcessModelconstruction:describingasetofpredeterminedclassesEachtuple/sampleisassumedtobelongtoapredefinedclass,asdeterminedbytheclasslabelattributeThesetoftuplesusedformodelconstructionistrainingsetThemodelisrepresentedasclassificationrules,decisiontrees,ormathematicalformulaeModelusage:forclassifyingfutureorunknownobjectsEstimateaccuracyofthemodelTheknownlabeloftestsampleiscomparedwiththeclassifiedresultfromthemodelAccuracyrateisthepercentageoftestsetsamplesthatarecorrectlyclassifiedbythemodelTestsetisindependentoftrainingset(otherwiseoverfitting)Iftheaccuracyisacceptable,usethemodeltoc...