NHANES Environmental Data Tutorial - Data Normality

Task 2: How to Transform a Single Variable

In the following example we apply a method to get the best transformation of a single variable, URXMHP.

Step 1: Apply the best transformation

The sample code is a SAS dataset where a new variable LN_URXMHP is created.

Program transform a variable in a data step

Libname NH 'C:\NHANES\DATA';

Data xmehp;
set NH.Phthalate_analysis_data;
if URXMHP>. then LN_URXMHP=log(URXMHP);
run;

Step 2: Check Data Normality of Transformed Variable

After the transformation, you may check the distribution of the transformed variable again, Ln(URXMHP). When comparing the descriptive estimates and plots between URXMHP and Ln_URXMHP, you see the approximation to a normal distribution is greatly improved after the transformation: the previous skewness and kurtosis are 13.176 and 233.304, now they reach at 0.581 and 0.027.

Program to Check the Transformed Variable

libname NH "C:\NHANES\DATA";

Data xmehp;
set NH.Phthalate_analysis_data;
if URXMHP>. then LN_URXMHP=log(URXMHP);
run;

Proc univariate data=xmehp;
     var LN_URXMHP;
     freq WTSPH6YR;
     title "Check data distribution for Ln_URXMHP";
run;

View demonstration of task.

Select Output of Program

Output of Program to Check the Transformed Variable [PDF - 124 KB]

Close Window to return to module page.