Key Concepts About Outliers in NHANES

Outliers, or extreme values, are almost always present in NHANES environmental chemical data. They may have resulted from transient sources of exposure to very high concentrations of environmental chemicals, unfavorable environmental conditions, kidney disease that slows the normal urinary filtration, and excretion of chemicals or other reasons related to data collection, measurement and recording. It is very important to examine the distribution and normality of the data, identify outlying values, and determine how such outliers might affect your analysis. If the data distribution is highly skewed, you can do a data transformation to make the distribution of the data closer to normal (the underlying assumption in most statistical analyses is that the distribution of the data is normal). In SAS, some common data transformations include the following syntax functions LOGIT, LOG, LOG10, SQRT, INVERSE, or ARCSIN. Use the Box-Cox procedure to find the best transformations for variables. (See Module 12).

Another important issue is large sample weights, particularly large weights in combination with extreme values (e.g., unusually high concentrations of environmental chemicals or metabolites in blood or urine of an individual). Either one of these conditions may unduly dominate the analysis and result in distorted, questionable, or inappropriate, conclusions. Before analyzing an environmental chemical, scan the sample weights to identify any very large, influential values. It is always a good practice to plot the values of the weights against the values of the analytes (e.g. concentrations of a specific chemical in blood and urine) to help identify influential outliers. The results of these screening exercises should be used to help make a determination of how the observations associated with these weights should be handled in your analysis. For more details on outliers, see Task 3.

 

close window icon Close Window