In this task, you will check for outliers and their potential impact using the following steps:
|
Before you analyze your data, examine the distribution and normality of the data, and identify outlying values.
proc univariate data =Phthalate normal plot ; var URXMHP; id seqn; run ; |
Example: Plot the phthalate subsample weight (WTSPH6YR) against the values of urinary mono-(2-ethyl)-hexyl phthalate to identify any outliers.
/******************************************************************************** * Use the PROC GPLOT procedure to plot urinary mono-(2-ethyl)-hexyl phthalate * * (URXMHP) by the corresponding weights for each observation in the dataset. * * Symbol and height are option statements used to format the output of the plot * ********************************************************************************/ symbol1 value =dot height = .2; proc gplot data =Phthalate; plot WTSPH6YR*URXMHP/ frame ; run ; |
In this step you will:
For this example, assume that four observations may be outliers.
/******************************************************************************* * Use the IF, THEN, and DELETE statements to remove the identified outliers. * * Use the PROC MEANS procedure to produce means and standard error for the * * dataset with and without outlier values. * ********************************************************************************/ data Exclu4SPs; set Phthalate; if seqn in ( 3140,11249,14737,24817) then delete ;
proc means data =Phthalate mean stderr maxdec = 1; title 'Without exclusion' ; var URXMHP; class RIAGENDR; weight WTSPH6YR;
proc means data =Exclu4SPs mean stderr maxdec = 1; title 'After removing 4 outlier values' ; var URXMHP; class RIAGENDR; weight WTSPH6YR; run ; |