温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,汇文网负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。
网站客服:3074922707
TM_D_7915_
_14
Designation:D791514An American National StandardStandard Practice forApplication of Generalized Extreme Studentized Deviate(GESD)Technique to Simultaneously Identify MultipleOutliers in a Data Set1This standard is issued under the fixed designation D7915;the number immediately following the designation indicates the year oforiginal adoption or,in the case of revision,the year of last revision.A number in parentheses indicates the year of last reapproval.Asuperscript epsilon()indicates an editorial change since the last revision or reapproval.1.Scope1.1 This practice provides a step by step procedure for theapplication of the Generalized Extreme Studentized Deviate(GESD)Many-Outlier Procedure to simultaneously identifymultiple outliers in a data set.(See Bibliography.)1.2 This practice is applicable to a data set comprisingobservations that is represented on a continuous numericalscale.1.3 This practice is applicable to a data set comprising aminimum of six observations.1.4 This practice is applicable to a data set where the normal(Gaussian)model is reasonably adequate for the distributionalrepresentation of the observations in the data set.1.5 The probability of false identification of outliers asso-ciated with the decision criteria set by this practice is 0.01.1.6 It is recommended that the execution of this practice beconducted under the guidance of personnel familiar with thestatistical principles and assumptions associated with theGESD technique.1.7 This standard does not purport to address all of thesafety concerns,if any,associated with its use.It is theresponsibility of the user of this standard to establish appro-priate safety and health practices and determine the applica-bility of regulatory limitations prior to use.2.Terminology2.1 Definitions of Terms Specific to This Standard:2.1.1 outlier,nan observation(or a subset of observations)which appears to be inconsistent with the remainder of the dataset.3.Significance and Use3.1 The GESD procedure can be used to simultaneouslyidentify up to a pre-determined number of outliers(r)in a dataset,without having to pre-examine the data set and make apriori decisions as to the location and number of potentialoutliers.3.2 The GESD procedure is robust to masking.Maskingdescribes the phenomenon where the existence of multipleoutliers can prevent an outlier identification procedure fromdeclaring any of the observations in a data set to be outliers.3.3 The GESD procedure is automation-friendly,and hencecan easily be programmed as automated computer algorithms.4.Procedure4.1 Specify the maximum number of outliers(r)in a data setto be identified.4.1.1 The recommended maximum number of outliers(r)by this practice is two(2)for data sets with six to twelveobservations.4.1.2 For data sets with more than twelve observations,therecommended maximum number of outliers(r)is the lesser often or 20%.4.1.3 The recommended values for r in 4.1.1 and 4.1.2 arenot intended to be mandatory.Users can specify other valuesbased on their specific needs.4.2 Compute test statistic T for each observation in theinitial starting data set(DTS0)as follows:T 5|x 2 x|s(1)where:x=an observation in the data set,x=average calculated using all observations in the data set,ands=sample standard deviation calculated using all observa-tions in the data set.4.3 Remove the observation in the data set with the largestabsolute magnitude of the test statistic T and form a reduceddata set(DTSi),where i=number of observations removedfrom the initial data set.4.4 Re-calculate T for all observations in the reduced dataset from 4.3.4.5 Repeat steps 4.3 to 4.4 until r number of observationshave been removed from the initial data set.That is,until1This practice is under the jurisdiction of ASTM Committee D02 on PetroleumProducts,Liquid Fuels,and Lubricants and is the direct responsibility of Subcom-mittee D02.94 on Coordinating Subcommittee on Quality Assurance and Statistics.Current edition approved May 1,2014.Published June 2014.DOI:10.1520/D7915-14.Copyright ASTM International,100 Barr Harbor Drive,PO Box C700,West Conshohocken,PA 19428-2959.United States1 calculation of all Ts for all observations in the reduced data setDTSrhas been completed.4.6 Compare the maximum T computed in each data set(DTS0to DTSr)to a critical value criticalassociated the data setDTSi,where is chosen based on a false identificationprobability of 0.01.See Table A1.1 in Annex A1 for valuesapplicable to different data set sizes.4.7 Identify the data set DTSmfor which the maximum Texceeds critical,and m(number of observations removed fromthe initial data set DTS0)is the largest value(0 DTS0T0DTS1T1DTS2T2DTS3T3DTS4T4DTS5T5DTS6T635.00.3035.00.4435.00.6435.00.9735.00.9435.01.0535.01.1636.60.0536.60.0436.60.1736.60.3736.60.3236.60.4036.60.4934.70.3734.70.5234.70.7334.71.0834.71.0634.71.1734.71.2936.20.0436.20.1436.20.2936.20.5236.20.4836.20.5636.20.6637.00.1437.00.0637.00.0537.00.2237.00.1737.00.2437.00.3225.32.4425.32.8537.20.1837.20.1137.2