This task reviews how to recode or derive new variables so they are appropriate for your analytic needs and how to check recoded or derived variables.
|
Creating new variables, by recoding or deriving, is an important step when preparing your analytic dataset. This is particularly true when you prepare an analytic dataset from different cycles of environmental chemical data. For some of the survey cycle data files, the indicator variable, which identifies values at or above or below LOD, has not been included as a "variable". You may want to create a new variable using multiple existing variables and different cut points. In addition, you may want to create a creatinine-adjusted variable of interest if a chemical is measured in urine.
The sample code below shows how to recode and derive new variables in multiple scenarios, using the SAS DATA step.
/******************************************************************************** * Create age categorical variable 1=6-11 2=12-19 3=20-39 4=40-59 5=60+ years. * * Record race/ethnicity categorical variable 1=NHW 2=NHB 3=MA 4=Other. * * Create categorical variable for the LOD 1=above or at LOD 2=below LOD * * Variable to indicate at/above or below LODs is not available in NH 99-02 * * Variable URDMHPLC to indicate at/above or below LOD is available in NH 03-04* * LOD for urinary mono-(2-ethyl)-hexyl phthalate is constant. * * The lowest value of urinary mono-(2-ethyl)-hexyl phthalate is below LOD. * * Create a creatinine-corrected variable to adjust for urine dilution. * ********************************************************************************/
data Phthalate; set Phthalate;
Age5cat=1+(ridageyr>=12)+(ridageyr>=20) +(ridageyr>=40) +(ridageyr>=60);
if ridreth1= 3 then reth4cat= 1; else if ridreth1= 4 then reth4cat= 2; else if ridreth1= 1 then reth4cat= 3; else reth4cat= 4;
if (sddsrvyr= 1 and URXMHP>0.8) or (sddsrvyr=2 and URXMHP>0.7) or (sddsrvyr=3 and URDMHPLC=0) then MHP_aLOD= 1; else if (sddsrvyr= 1 and URXMHP=0.8) or (sddsrvyr=2 and URXMHP=0.7) or (sddsrvyr=3 and URDMHPLC=1) then MHP_aLOD= 2;
if URXMHP> 0 and URXUCR>0 then MHP_UCR= 100*URXMHP/URXUCR; run ; |
In this step, you will use the PROC FREQ, PROC MEANS, and PROC PRINT procedures in SAS to confirm that the derived and recoded variables correctly correspond to the original variables.
/************************************************************************** * Use the PROC MEANS procedure to check created age categorical variable * * Use the PROC FREQ procedure to check recorded categorical variable * * Use the PROC FREQ procedure to check created LOD categorical variable * * Use the PROC MEANS procedure to check created LOD categorical variable * * Use the PROC PRINT procedure to check creatinine-corrected variable * ***************************************************************************/ proc means data =Phthalate N min max maxdec = 0; var ridageyr; class Age5cat; title 'Check created age categorical variable' ; proc freq data =Phthalate; table reth4cat*ridreth1/ list missing ; title 'Check Recorded Race/ethnicity Variable' ; proc freq data =Phthalate; table MHP_aLOD*URDMHPLC*sddsrvyr/ list missing ; where WTSPH6YR> 0 and URXMHP>0; title 'Check created LOD categorical variable' ; proc means data =Phthalate N min max maxdec = 1; var URXMHP; class MHP_aLOD sddsrvyr; title 'Check created LOD categorical variable' ; proc print data =Phthalate ( obs = 10); id seqn; var URXMHP URXUCR MHP_UCR; title 'Check creatinine-corrected variable' ; run ; |
![]() |
Close Window to return to module page.