In this example, the distribution of total calcium intake will be estimated for women older than age 50 years.
This example uses the demoadv dataset (download at Sample Code and Datasets). The variables w0304_0 to w0304_16 are the weights (dietary weights and balanced repeated replication [BRR] weights) used in the analysis of 2003-2004 dietary data that requires the use of BRR to calculate standard errors. The model is run 17 times, including 16 runs using BRR (see Module 18, Task 4 for more information). BRR uses weights w0304_1 to w0304_16.
Note: If 4 years of NHANES data are used, 32 BRR runs are required. Additional weights are found in the demoadv dataset.
A SAS macro is a useful technique for rerunning a block of code when the analyst only wants to change a few variables; the macro BRR212 is created and called in this example. The BRR212 macro calls the MIXTRAN macro, and calculates BRR standard errors of the parameter estimates. The MIXTRAN macro obtains preliminary estimates for the values of the parameters in the model, and then fits the model using PROC NLMIXED. It also produces summary reports of the model fit. Recall that modeling the complex survey structure of NHANES requires procedures that account for both differential weighting of individuals and the correlation among sample persons within a cluster. The SAS procedure NLMIXED can account for differential weighting by using the replicate statement. The use of BRR to calculate standard errors accounts for the correlation among sample persons in a cluster. Therefore, NLMIXED (or any SAS procedure that incorporates differential weighting) may be used with BRR to produce standard errors that are suitable for NHANES data without using specialized survey procedures.
The MIXTRAN macro used in this example was downloaded from the NCI website. Version 1.1 of the macro was used. Check this website for macro updates before starting any analysis. Additional details regarding the macro and additional examples also may be found on the website and in the users’ guide.
Statements | Explanation |
---|---|
data demoadv; select ; array dum1 ( 3 ) CA_SU2 CA_SU3 CA_SU4; if DR1DAY in ( 1 , 6 , 7 ) then weekend1= 1 ; if w0304_0 ne . ; /* keeps only those with dietary data */
|
In NHANES 2003-2004, calcium from water was not included in DR1TCALC and DR2TCALC, so this is added using the water variables DR1_330, DR2_330, DR1BWATR, and DR2BWATR, creating new variables DR1CALCW and DR2CALCW. The amount of calcium in tap water (DR1_330 or DR2_330) and bottled water (DR1BWATR or DR2BWATR) are estimated to be 3 mg and 10 mg per 100 grams, respectively. (Note: In 2005-2006 this was changed, so this step is not necessary.) If there are no water variables, then the new calcium variables are set to missing. Categorical supplement variables are created using do/select syntax with the DAILYAVG variable, which is a created variable that is equal to the daily average calcium intake from dietary supplements (see Module 14, Task 3 for more details). Dummy variables are created from these categories. Dummy variables are also created for weekend. In this dataset only those people with dietary data are selected by including those without missing BRR weights. |
data adultf; |
A dataset that selects women older than age 50 years is created for this example. |
data day1;
data day2; |
The variables DR1CALCW and DR2CALCW created above represent total calcium consumed on days 1 and 2 respectively from all foods and beverages (including water). To create a dataset with 2 records per person, the demoadv dataset is set 2 times to create 2 datasets, one where day=1 and one where day=2. The same variable name, DRTCALCW, is used for calcium on both days. It is created by setting it equal to DR1TCALCW for day 1 and DR2TCALCW for day 2. |
data calcium; |
Finally, these data sets are appended and a weekend variable is created. (This variable must be named weekend for the MIXTRAN macro.) |
It is important to sort the dataset by respondent and intake day (day 1 and 2) because the NLMIXED procedure uses this information to estimate the model parameters.
proc sort ; by seqn day; run ;
The BRR221 macro calls the MIXTRAN macro and DISTRIB macro and computes standard errors of parameter estimates. After creating this macro and running it 1 time, it may be called several times, each time changing the macro variables.
Statements | Explanation |
---|---|
%include 'C:\NHANES\Macros\mixtran_macro_v1.1.sas' ; |
This code reads the MIXTRAN macro into SAS so that it may be called. |
%macro BRR221(data, response, datam, var1, var2, supp, EAR, UL, foodtype, subject, repeat,covars_amt, outlib, modeltype, lambda, seq, weekend, vargroup, numvargroups, subgroup, start_val1, start_val2, start_val3, vcontrol, loptions, titles, printlevel, final);
|
The start of the BRR221 macro is defined. All of the terms inside the parentheses are the macro variables that are used in the macro. |
%MIXTRAN (data=&data, response=&response, foodtype=&foodtype, subject=&subject, repeat=&repeat, ovars_amt=&covars_amt, outlib=&outlib, modeltype=&modeltype, lambda=&lambda, replicate_var= w0304_0, seq=&seq, weekend=&weekend, vargroup=&vargroup, numvargroups=&numvargroups, subgroup=&subgroup, start_val1= &start_val1, start_val2=&start_val2, start_val3=&start_val3, vcontrol=&vcontrol, nloptions=&nloptions, titles=&titles, printlevel=&printlevel)
|
Within the BRR221 macro, the MIXTRAN macro is called. All of the variables preceded by & will be defined by the BRR221 macro call. The only variable without an & is the replicate_var macro variable; it is set to w0304_0 for the first run. |
data _allparm; |
The dataset _param_unc_&foodtype is created in the MIXTRAN run. This data step saves those parameter estimates and labels the run as ‘0’. |
data _allpred; set & outlib.._ pred_unc_& foodtype. (rename=(w0304_0=w0304)); run= 0 ; drop numsubjects totsubjects totwts indwts grpwts weekend x2b2_1; run; |
The dataset _pred_unc_&foodtype and is created in the MIXTRAN run. This data step saves the predicted values, renames the weight to w0304, and drops unneeded variables. |
data _null_; |
This datastep pulls a_lambda from _param_unc_&foodtype and creates a macro variable called lamb. |
%do brun= 1 %to 16 ;
|
This code starts a loop to run the 16 BRR runs. |
options nonotes; |
Notes are turned off to save room in the log. |
%put ~~~~~~~~~~~~~~RUN &brun~~~~~~~~~~~~~~~; |
The run number is printed to the log. |
%MIXTRAN (data=&data, response=&response, foodtype=&foodtype, subject=&subject, repeat=&repeat, covars_amt=&covars_amt, outlib=&outlib, modeltype=&modeltype, lambda=&lamb, replicate_var=w0304_&brun, seq=&seq, weekend=&weekend, vargroup= &vargroup, numvargroups=&numvargroups, subgroup=&subgroup, start_val1=&start_val1, start_val2=&start_val2, start_val3=&start_val3, vcontrol=&vcontrol, nloptions=&nloptions, titles=&titles, printlevel=&printlevel) |
Within the BRR221 macro, the MIXTRAN macro is called. All of the variables preceded by & will be defined by the BRR221 macro call. The only variable without an & is the replicate_var macro variable; it is set to w0304_&brun where &brun equals 1 to 16. |
data brrparm; |
The dataset _param_unc_&foodtype is created in the MIXTRAN run. This data step saves those parameter estimates and labels the run as &brun. |
proc append base=_allparm data=brrparm; |
The BRR datasets are appended into the _allparm dataset that was created for run 0. |
data brrpred; set & outlib.._ pred_unc_& foodtype. (rename=(w0304_& brun. =w0304)); run=&brun; drop numsubjects totsubjects totwts indwts grpwts weekend x2b2_1; run; |
The dataset _pred_unc_&foodtype and is created in the MIXTRAN run. This data step saves the predicted values, renames the weight to w0304, and drops unneeded variables. |
proc append base=_allpred data=brrpred; |
The BRR datasets are appended into the _allpred dataset that was created for run 0. |
proc datasets nolist; delete brrparms brrpred; run; |
The BRR datasets are appended into the _allpred dataset that was created for run 0. Unneeded datasets are deleted. |
%end ; |
The BRR runs end. |
%if &weekend ne %str () %then %do ; data _null_; %let I=1; %let varamtu= %upcase (&covars_amt); %do %until ( %qscan (&varamtu,&I, %str ( ))= %str ()); %let I= %eval (&I+1); %end ; %let cnt=&I; %if &covars_amt= %str () %then %let cnt=1; %if &cnt lt 9 %then %let znum = "0"; %else %let znum= %str () ; num= %eval (&i+ 1 ); varA= strip( 'A' ||strip(&znum)||strip(num)|| '_' ); wkendparm=strip(strip(varA)|| 'WEEKEND' ); %if &covars_amt= %str () %then %let cnt=0; call symput ( 'wkendparm' ,wkendparm); run; %let weekend1=WEEKEND1; %let weekend2=WEEKEND2; %end ; %else %do ; %let wkendparm=0; %let weekend1=0; %let weekend2=0; %end ; |
Weekend macro variables wkendparm, weekend1, and weekend2 are created. These variables are set to 0 if the macro variable weekend is not set to a value in order to estimate the usual intake of calcium in a data step below. |
%if &seq ne %str () %then %do ; data _null_; %let I=1; %let varamtu= %upcase (&covars_amt); %do %until ( %qscan (&varamtu,&I, %str ( ))= %str ()); %let I= %eval (&I+1); %end ; %let cnt= %eval (&I+1); %if &covars_amt= %str () %then %let cnt=1; %if &cnt lt 9 %then %let znum = "0"; %else %let znum= %str () ; num= %eval (&i+ 2 ); varA= strip( 'A' ||strip(&znum)||strip(num)|| '_' ); seqparm=strip(strip(varA)||strip( "&seq" )); call symput ( 'seqparm' ,seqparm); run; %end ;
%else %let seqparm=0; |
Sequence macro variable seqparm is created. This variable is set to 0 if the macro variable seq is not set to a value in order to estimate the usual intake of calcium in a data step below. |
data _allpred; |
The predictions and parameter datasets are merged. |
proc sort data=_allpred; |
The data are sorted by the respondent and by the BRR run. |
proc sort data=&final; |
The final dataset is sorted. |
proc sort data=&datam; |
The dataset with 1 line per person is sorted by respondent. |
data analyze;
|
This data step merges _allpred with the dataset with 1 line per person and creates empirical shrinkage estimates. |
NDAYS= 0 ; if (&var1 ^= . ) then do; NDAYS=NDAYS+ 1 ; Z1=(((&var1 + .5 *min_amt*(&var1= 0 ))**a_lambda)- 1 )/a_lambda; toterr1=Z1 - (x2b2_0 + &wkendparm*&WEEKEND1); end; if (&var2 ^= . ) then do; NDAYS=NDAYS+ 1 ; Z2=(((&var2 + .5 *min_amt*(&var2= 0 ))**a_lambda)- 1 )/a_lambda; toterr2=Z2 - (x2b2_0 + &wkendparm*&WEEKEND2 + &seqparm); end; avgerr=mean(toterr1,toterr2); ui=avgerr*sqrt(A_VAR_U2/(A_VAR_U2 + A_VAR_E/NDAYS)); zbar_0=x2b2_0 + ui; zbar_1=x2b2_0 + &wkendparm + ui; bt_0 = a_lambda*zbar_0 + 1 ; bt_1 = a_lambda*zbar_1 + 1 ; if (bt_0 <= 0 ) then bt_0= 0 ; else bt_0 = (bt_0**( 1 /a_lambda)) + ( 1 - a_lambda)*(A_VAR_E/ 2 )*(bt_0**( 1 /a_lambda - 2 )); if (bt_1 <= 0 ) then bt_1= 0 ; else bt_1 = (bt_1**( 1 /a_lambda)) + ( 1 - a_lambda)*(A_VAR_E/ 2 )*(bt_1**( 1 /a_lambda - 2 )); &foodtype._FDW=( 4 *bt_0 + 3 *bt_1)/ 7 ; &foodtype._TOT=&foodtype._FDW + &supp; FDW_GE_EAR=(&foodtype._FDW >= &EAR); TOT_GE_EAR=(&foodtype._TOT >= &EAR); FDW_GT_UL=(&foodtype._FDW > &UL); TOT_GT_UL=(&foodtype._TOT > &UL); |
The day 1 data are transformed using the Box-Cox transformation. Within this transformation step, if the variable=0 then it is set to ½ of the minimum amount. Next, the residual is computed (toterr1). This process is repeated for day 2. The coding assumes that &seqparm is equal to 1 for day 2 and 0 for day 1. Overall usual intake from foods, beverages, and water is computed as the weighted mean of weekday and weekend intake (&foodtype_FDW). Total usual intake from food, beverages, water, and supplements (&foodtype_TOT) is the sum of &foodtype_FDW and the intake of supplements (&supp). |
drop toterr1 toterr2 avgerr zbar_0 zbar_1 bt_0 bt_1 x2b2_0 min_amt A_VAR_U2 A_VAR_E a_lambda &supp &var1 &var2 NDAYS z1 z2; |
Unneeded variables are dropped. |
proc sort data=analyze; |
The dataset is sorted. |
proc univariate plot data=analyze noprint; |
Percentiles are computed by run using the weight w0304 (which was renamed in an earlier dataset from w0304_&brun). Percentiles 1, 5, 25, 50, 75, 95, and 99 are saved. The freq statement is used to account for the complex survey sampling of NHANES. |
proc univariate plot data=analyze noprint; |
The mean values (including proportions for the EAR and UL variables) are computed using proc univariate. These variables need to be weighted by the w0304 to calculate the means properly adjusted for the complex sampling in NHANES. A new dataset, called means, is created with these values. |
data distrib; |
The means and percentiles are merged, and proportions that were computed in means are converted to percents. |
data distrib0; |
The estimates for the base run are saved in distrib0. |
proc means data=distrib(where=(run > 0 )) noprint; var &foodtype._FDW FDW_P1 FDW_P5 FDW_P25 FDW_P50 FDW_P75 FDW_P95 FDW_P99 FDW_GE_EAR FDW_GT_UL &foodtype._TOT TOT_P1 TOT_P5 TOT_P25 TOT_P50 TOT_P75 TOT_P95 TOT_P99 TOT_GE_EAR TOT_GT_UL; output out=brr mean= _&foodtype._FDW _FDW_P1 _FDW_P5 _FDW_P25 _FDW_P50 _FDW_P75 _FDW_P95 _FDW_P99 _FDW_GE_EAR _FDW_GT_UL _&foodtype._TOT _TOT_P1 _TOT_P5 _TOT_P25 _TOT_P50 _TOT_P75 _TOT_P95 _TOT_P99 _TOT_GE_EAR _TOT_GT_UL var= S_&foodtype._FDW S_FDW_P1 S_FDW_P5 S_FDW_P25 S_FDW_P50 S_FDW_P75 S_FDW_P95 S_FDW_P99 S_FDW_GE_EAR S_FDW_GT_UL S_&foodtype._TOT S_TOT_P1 S_TOT_P5 S_TOT_P25 S_TOT_P50 S_TOT_P75 S_TOT_P95 S_TOT_P99 S_TOT_GE_EAR S_TOT_GT_UL; run; |
This code computes the means and variances of the BRR runs. The means are prefixed with underscores to distinguish them from the base runs. The variances are prefixed with S_. |
data distrib0; merge distrib0 brr; array est ( 20 ) &foodtype._FDW FDW_P1 FDW_P5 FDW_P25 FDW_P50 FDW_P75 FDW_P95 FDW_P99 FDW_GE_EAR FDW_GT_UL &foodtype._TOT TOT_P1 TOT_P5 TOT_P25 TOT_P50 TOT_P75 TOT_P95 TOT_P99 TOT_GE_EAR TOT_GT_UL; array se ( 20 ) S_&foodtype._FDW S_FDW_P1 S_FDW_P5 S_FDW_P25 S_FDW_P50 S_FDW_P75 S_FDW_P95 S_FDW_P99 S_FDW_GE_EAR S_FDW_GT_UL S_&foodtype._TOT S_TOT_P1 S_TOT_P5 S_TOT_P25 S_TOT_P50 S_TOT_P75 S_TOT_P95 S_TOT_P99 S_TOT_GE_EAR S_TOT_GT_UL; array tmp ( 20 ) _&foodtype._FDW _FDW_P1 _FDW_P5 _FDW_P25 _FDW_P50 _FDW_P75 _FDW_P95 _FDW_P99 _FDW_GE_EAR _FDW_GT_UL _&foodtype._TOT _TOT_P1 _TOT_P5 _TOT_P25 _TOT_P50 _TOT_P75 _TOT_P95 _TOT_P99 _TOT_GE_EAR _TOT_GT_UL; do i= 1 to 20 ; se(i)=-1*sqrt(se(i)*( 15 / 16 ) + (est(i)-tmp(i))** 2 )/sqrt( .49 ); if (se(i)= 0 ) then se(i)=- 0.0001 ; end; |
The base datasets and BRR datasets are combined. There are 20 statistics that need standard errors that are defined in arrays. |
drop _&foodtype._FDW _FDW_P1 _FDW_P5 _FDW_P25 _FDW_P50 _FDW_P75 _FDW_P95 _FDW_P99 _FDW_GE_EAR _FDW_GT_UL _&foodtype._TOT _TOT_P1 _TOT_P5 _TOT_P25 _TOT_P50 _TOT_P75 _TOT_P95 _TOT_P99 _TOT_GE_EAR _TOT_GT_UL; drop i; run; |
Unneeded variables are dropped. |
data toprint1; keep &foodtype._FDW FDW_P1 FDW_P5 FDW_P25 FDW_P50 FDW_P75 FDW_P95 FDW_P99 FDW_GE_EAR FDW_GT_UL &foodtype._TOT TOT_P1 TOT_P5 TOT_P25 TOT_P50 TOT_P75 TOT_P95 TOT_P99 TOT_GE_EAR TOT_GT_UL line; run; |
To create the final dataset, the point estimates are saved in a file called toprint1. The variable line will identify them as estimates. |
data toprint2; set distrib0(drop=&foodtype._FDW FDW_P1 FDW_P5 FDW_P25 FDW_P50 FDW_P75 FDW_P95 FDW_P99 FDW_GE_EAR FDW_GT_UL &foodtype._TOT TOT_P1 TOT_P5 TOT_P25 TOT_P50 TOT_P75 TOT_P95 TOT_P99 TOT_GE_EAR TOT_GT_UL); line= 2 ; rename S_&foodtype._FDW=&foodtype._FDW S_FDW_P1= FDW_P1 S_FDW_P5= FDW_P5 S_FDW_P25= FDW_P25 S_FDW_P50= FDW_P50 S_FDW_P75= FDW_P75 S_FDW_P95= FDW_P95 S_FDW_P99= FDW_P99 S_FDW_GE_EAR= FDW_GE_EAR S_FDW_GT_UL= FDW_GT_UL S_&foodtype._TOT=&foodtype._TOT S_TOT_P1= TOT_P1 S_TOT_P5= TOT_P5 S_TOT_P25= TOT_P25 S_TOT_P50= TOT_P50 S_TOT_P75= TOT_P75 S_TOT_P95= TOT_P95 S_TOT_P99= TOT_P99 S_TOT_GE_EAR=TOT_GE_EAR S_TOT_GT_UL=TOT_GT_UL; drop run; run; |
The standard errors are saved in a dataset called toprint2. The variable line will identify them as standard errors. The standard errors are renamed to the original variable names. |
data &final; |
The final dataset is created by appending toprint1 and toprint2. |
proc sort data=&final; |
The final dataset is sorted. |
proc print data=&final split= ' ' noobs; var line &foodtype._FDW FDW_P1 FDW_P5 FDW_P25 FDW_P50 FDW_P75 FDW_P95 FDW_P99 FDW_GE_EAR FDW_GT_UL ; label line = ' ' &foodtype._FDW = 'Mean' FDW_P1 = '1st Pctile' FDW_P5 = '5th Pctile' FDW_P25 = '25th Pctile' FDW_P50 = '50th Pctile' FDW_P75 = '75th Pctile' FDW_P95 = '95th Pctile' FDW_P99 = '99th Pctile' FDW_GE_EAR = 'Pct>=EAR' FDW_GT_UL = 'Pct>UL' ; format line line. &foodtype._FDW FDW_P1 FDW_P5 FDW_P25 FDW_P50 FDW_P75 FDW_P95 FDW_P99 FDW_GE_EAR FDW_GT_UL negparen10.1 ; title 'Usual Intake of Calcium from Food and Water Sources Only' ; title2 'NHANES 2003-04' ; run; |
The estimate for calcium from foods, beverages, and water are printed, labeling the variables. The format negparen will make the standard errors print in parentheses. |
proc print data=&final split= ' ' noobs; var line &foodtype._TOT TOT_P1 TOT_P5 TOT_P25 TOT_P50 TOT_P75 TOT_P95 TOT_P99 TOT_GE_EAR TOT_GT_UL; label line = ' ' &foodtype._TOT = 'Mean' TOT_P1 = '1st Pctile' TOT_P5 = '5th Pctile' TOT_P25 = '25th Pctile' TOT_P50 = '50th Pctile' TOT_P75 = '75th Pctile' TOT_P95 = '95th Pctile' TOT_P99 = '99th Pctile' TOT_GE_EAR= 'Pct>=EAR' TOT_GT_UL= 'Pct>UL' ; format line line. &foodtype._TOT TOT_P1 TOT_P5 TOT_P25 TOT_P50 TOT_P75 TOT_P95 TOT_P99 TOT_GE_EAR TOT_GT_UL negparen10.1 ; title 'Usual Intake of Calcium from Food and Water Plus Dietary Supplement Sources' ; title2 'NHANES 2003-04' ; run; |
The estimate for calcium from food, beverages, water, and supplements are printed, labeling the variables. The format negparen will make the standard errors print in parentheses. |
%mend BRR221; | The end of the BRR221 macro is indicated. |
Statements | Explanation |
---|---|
%BRR221(data=calcium, response=drtcalcw, datam=adultf, var1=DR1TCALCW, var2=DR2TCALCW, supp=DAILYAVG, EAR= 1000 , UL= 2500 , foodtype=calciumw, subject=seqn, repeat=DAY2, covars_amt=CA_SU2 CA_SU3 CA_SU4, outlib=work, modeltype=amount, lambda=, seq=DAY2, weekend=WEEKEND, vargroup=, numvargroups=, subgroup=, start_val1=, start_val2=, start_val3=, vcontrol=, nloptions=, titles= 1 , printlevel= 1 , final=nh.m22task1)
|
This code calls the BRR221 macro. The dataset calcium defined in Step 1 is used; the macro variable response for which we want to model the distribution is DRTCALCW. The individual day variables, DR1TCALCW and DR2TCALCW , and the supplement variable, DAILYAVG must also be specified. By specifying the EAR and UL values of 1000 mg and 2500 mg, respectively, the macro will produce an estimate of the proportion of the population below these values. These variables are required. The macro variable foodtype is used to label the _pred and _param datasets. The variable seqn identifies the subject, and the macro variable repeat defines the variable that identifies the repeats on the subject, which is day2. The covariates CA_SU2 CA_SU3 CA_SU4 are included in the model. The macro variable outlib specifies the library where the data are to be stored. In this case, the working directory, work, was used. Because this is a ubiquitously-consumed dietary constituent, modeltype= amount is specified. This fits the amount model. The seq variable is day2, which indicates if the recall was the 1st or 2nd day. The weekend macro variable includes a weekend effect in the model, and it calculates the distribution by 4/7 for weekdays and 3/7 for weekends. It must be set equal to a variable called weekend in the dataset. The variable final specifies the name of the final dataset produced. |
Note: Your results may vary slightly, as a random seed is used to estimate the distribution of usual intake. However, they would not be expected to vary by more than 1%.