In this task, you will check for outliers and their potential impact using the following steps:
The steps below assume that you are already familiar with the SAS code used to identify outliers and evaluate their impact in NHANES datasets. If you need more detailed instructions, please review the Clean & Recode Data module in the Continuous NHANES Web Tutorial before continuing. |
Before you analyze your data, examine the distribution and normality of the data, and identify outlying values.
proc univariate data =Phthalate normal plot ; var URXMHP; id seqn; run ; |
Example: Plot the phthalate subsample weight (WTSPH6YR) against the values of urinary mono-(2-ethyl)-hexyl phthalate to identify any outliers.
/******************************************************************************** * Use the PROC GPLOT procedure to plot urinary mono-(2-ethyl)-hexyl phthalate * * (URXMHP) by the corresponding weights for each observation in the dataset. * * Symbol and height are option statements used to format the output of the plot * ********************************************************************************/ symbol1 value =dot height = .2; proc gplot data =Phthalate; plot WTSPH6YR*URXMHP/ frame ; run ; |
In this step you will:
For this example, assume that four observations may be outliers.
/******************************************************************************* * Use the IF, THEN, and DELETE statements to remove the identified outliers. * * Use the PROC MEANS procedure to produce means and standard error for the * * dataset with and without outlier values. * ********************************************************************************/ data Exclu4SPs; set Phthalate; if seqn in ( 3140,11249,14737,24817) then delete ;
proc means data =Phthalate mean stderr maxdec = 1; title 'Without exclusion' ; var URXMHP; class RIAGENDR; weight WTSPH6YR;
proc means data =Exclu4SPs mean stderr maxdec = 1; title 'After removing 4 outlier values' ; var URXMHP; class RIAGENDR; weight WTSPH6YR; run ; |