How to Identify and Describe the Impact of Influential Outliers
Before you analyze your data, it is very important that you examine the data for the presence of outlying values.
Check for Outliers among by Running a Univariate Analysis
Use the PROC UNIVARIATE procedure to get all default descriptive statistics such as mean, minimum, and maximum values, standard deviation, and skewness. Use the VAR statement to identify the variable of interest (PAG_MINW). Use the ID statement to list the sequence numbers associated with extreme values in the output.
Sample Code
data =cvx normal plot ;
var CVDESVO2;
where WTINT4CD > 0 and RIDAGEYR >= 16;
id seqn;
title 'Distribution of estimated VO2 max among study participants aged 12 to 49 years' ;
;
Output of Program
Download program output[PDF - 58 KB]
Plot Sample Weight against the Distribution of the Variable
Use the PROC GPLOT procedure to estimated VO2 max (CVDESVO2) by the corresponding sample weight for each observation in the dataset. Set 20 ml/kg/min as the minimum reasonable estimated VO2 max and 90 ml/kg/min as the maximum reasonable VO2 max based on observed measures by sex age and sex.
Sample Code
symbol1 value = dot height = .2 ;
title ;
data = cvx;
plot WTMEC4BC*CVDESVO2/ href = 20 , 90 frame ;
where WTMEC4BC > 0 and RIDAGEYR >= 12 and RIDAGEYR <= 49 ;
;
Output of Program
Download program output [PDF - 62 KB]- The observed outlier observations have a moderately large sample weights. Therefore, removing these observations would not have a great effect on population estimates.
Identify Outliers and Compare Estimates with Outliers Deleted Against the Original Estimates with Outliers Included
Use the IF, THEN, and DELETE statements in the DATA step to delete the identified outliers with with CVDESV02 < 20 or > 90. Use the PROC MEANS procedure to determine the mean and standard error for the dataset both with and without excluding the outlier values.
Sample Code
data exclude_SP;
set cvx;
if WTMEC4BC > 0 and RIDAGEYR >= 12 and RIDAGEYR <= 49 and CVDESVO2 < 20
then delete ;
if WTMEC4BC > 0 and RIDAGEYR >= 12 and RIDAGEYR <= 49 and CVDESVO2 > 90
then delete
;
value GENDERF 1 = 'Male'
2 = 'Female' ;
data = cvx mean stderr maxdec = 1 ;
title 'No Exclusions' ;
var CVDESVO2;
class RIAGENDR;
weight WTMEC4BC;
format RIAGENDR GENDERF. ;
;
data = exclude_SP mean stderr maxdec = 1 ;
title Outlier Exclusion' ;
var CVDESVO2;
class RIAGENDR;
weight WTMEC4BC;
format RIAGENDR GENDERF. ;
;
Output of Program
Download program output [PDF - 38 KB]