Task 3: How to Examine the Relationship Between Usual Intake of a Single Episodically-consumed Dietary Constituent and Some Outcome

In this example, the relationship between milk intake and osteoporosis treatment status (treatosteo) will be modeled.  A participant was coded as having had treatment for osteoporosis if he or she responded “yes” to OSQ.070 (“{Were you/Was SP} treated for osteoporosis?”) from the osteoporosis questionnaire, and was set to “no” if he or she responded “no” to OSQ.070 or to OSQ.060 (“Has a doctor ever told {you/SP} that {you/s/he} had osteoporosis, sometimes called thin or brittle bones?”) from the osteoporosis questionnaire. (The SAS code to create this variable is found in the “Supplement Program” sample SAS code.)

This example uses the demoadv dataset (download at Sample Code and Datasets).  The variables w0304_0 to w0304_16 are the weights (dietary weights and balanced repeated replication (BRR) weights) used in the analysis of 2003-2004 dietary data that requires the use of BRR to calculate standard errors. The model is run 17 times, including 16 runs using BRR (see Module 18, Task 4 for more information).  BRR uses weights w0304_1 to w0304_16

 

Info iconIMPORTANT NOTE

Note: If 4 years of NHANES data are used, 32 BRR runs are required. Additional weights are found in the demoadv dataset.

 

A SAS macro is a useful technique for rerunning a block of code when the analyst only wants to change a few variables; the macro BRR212 is created and called in this example. The BRR212 macro calls the MIXTRAN, DISTRIB, and INDIVINT macros, and calculates BRR standard errors of the parameter estimates.  The MIXTRAN macro obtains preliminary estimates for the values of the parameters in the model, and then fits the model using PROC NLMIXED. It also produces summary reports of the model fit.  Recall that modeling the complex survey structure of NHANES requires procedures that account for both differential weighting of individuals and the correlation among sample persons within a cluster.  The SAS procedure NLMIXED can account for differential weighting by using the replicate statement.  The use of BRR to calculate standard errors accounts for the correlation among sample persons in a cluster.  Therefore, NLMIXED (or any SAS procedure that incorporates differential weighting) may be used with BRR to produce standard errors that are suitable for NHANES data without using specialized survey procedures. The DISTRIB macro estimates the distribution of usual intake, producing estimates of percentiles and the percent of the population below a cutpoint.

 

Info iconIMPORTANT NOTE

Note that the DISTRIB macro currently requires at least 2 cutpoints be requested in order to calculate the percent of the population below a cutpoint.

 

The MIXTRAN, DISTRIB, and INDIVINT macros used in this example were downloaded from the NCI website.  Version 1.1 of the macros was used. Check this website for macro updates before starting any analysis.  Additional details regarding the macro and additional examples also may be found on the website and in the users’ guide.

 

Step 1: Create one dataset so that each row corresponds to a single person day and define indicator variables if necessary, and create a second dataset for the INDIVINT macro

 

Statements Explanation

data demoadv;
  set nh.demoadv;
if w0304_0 ne . ;   
run ;

First, select only those people with dietary data by selecting those without missing BRR weights.

data adultf;
set demoadv;
if riagendr= 2 and ridageyr>= 19 ;
run ;

Because the INDIVINT macro also requires a dataset with 1 record per person, a dataset called adultf for adult females ages 19 years and older is created.

data day1;
set demoadv;
if riagendr= 2 and ridageyr>= 19 ;
d_milk=d_milk_d1;
day= 1 ;
run ;

 

data day2;
set demoadv;
if riagendr= 2 and ridageyr>= 19 ;
d_milk=d_milk_d2;
day= 2 ;
run ;

 

 

The variables d_milk_d1 and d_milk_d2 are calculated variables representing total milk consumed on days 1 and 2 respectively from all foods and beverages (see Module 4, Task 2 and Module 9, Task 4 for more details).  To create a dataset with 2 records per person, the demoadv dataset is set 2 times to create 2 datasets, one where day=1 and one where day=2.  The same variable name, d_milk, is used for calcium on both days.  It is created by setting it equal to d_milk_d1 for day 1 and d_milk_d2  for day 2. 

data milk;
set day1 day2;
run ;

Finally, these data sets are appended.

 

Step 2: Sort the dataset by respondent and day

It is important to sort the dataset by respondent and day of intake because the NLMIXED procedure uses this information to estimate the model parameters.

proc sort ; by seqn day; run ;

 

Step 3: Create the BRR212 macro

The BRR212 macro calls the MIXTRAN macro and computes standard errors of parameter estimates.  After creating this macro and running it 1 time, it may be called several times, each time changing the macro variables.

 

Statements Explanation

%include   'C:\NHANES\Macros\mixtran_macro_v1.1.sas' ;
%include   'C:\NHANES\Macros\indivint_macro_v1.1.sas' ;
%include   'C:\NHANES\Macros\distrib_macro_v1.1.sas' ;

This code reads the MIXTRAN, DISTRIB, and INDIVINT macros into SAS so that these macros may be called.

title2 "Task 3 - Milk and Osteoporosis Treatment" ;

This title should be printed on all output.

%macro BRR212(data, response, datam, var12, regresponse, foodtype, subject, repeat, covars_prob, covars_amt, outlib, modeltype, lambda, covars_reg, seq, weekend, vargroup, numvargroups, subgroup, start_val1, start_val2, start_val3, vcontrol, nloptions, titles, printlevel, cutpts, ncutpts, nsim_mc, byvar, t0, t1, final);

 

 

The start of the BRR212 macro is defined.  All of the terms inside the parentheses are the macro variables.

%MIXTRAN (data=&data, response=&response, foodtype=&foodtype, subject=&subject, repeat=&repeat, covars_prob=&covars_prob, covars_amt=&covars_amt, outlib=&outlib, modeltype=&modeltype,                 lambda=&lambda, replicate_var=w0304_0, seq=&seq, weekend=&weekend, vargroup=&vargroup, numvargroups= &numvargroups, subgroup=&subgroup, start_val1=&start_val1, start_val2=&start_val2, start_val3= &start_val3, vcontrol= &vcontrol, nloptions=&nloptions, titles= &titles, printlevel= &printlevel)

Within the BRR212 macro, the MIXTRAN macro is called.  All of the variables preceded by & will be defined by the BRR212 macro call.  The only variable without an & is the replicate_var macro variable; it is set to w0304_0 for the first run.

data parampred(keep = &subject a_var_u2 a_var_e a_lambda x2b2);
if (_n_ = 1 ) then set & outlib.._ param_unc_&foodtype;
set & outlib.._ pred_unc_&foodtype;
run;

Input datasets for the INDIVINT macro are created.  The datasets _pred_unc_&foodtype and _param_unc_&foodtype are created in the MIXTRAN run.  The syntax “if (_n_=1) then set” is a way of doing a 1-to-many merge, where the dataset _param_unc_&foodtype is merged with each row of the _pred_unc_&foodtype dataset.

data parsubj1rec;

  merge parampred &datam(keep = &subject &var12);

  by &subject;

run;

The INDIVINT macro requires a dataset with only one line per person called parsubj1rec.  This is defined by the macro variable datam in the BRR212 macro, and the response variables are defined by the macro variable var12.

%DISTRIB (seed= 0 , nsim_mc= 100 ,modeltype=&modeltype, pred=& outlib.._ pred_&foodtype, param= & outlib.._ param_&foodtype, outlib=&outlib, subject=&subject,food=&foodtype);

The DISTRIB macro is used to find the lambda for the transformation of the estimate of T (true usual intake) that will be used in the regression model.

title&titles; 

title %eval (&titles+ 1 );  

title "Task 3 - Milk and Osteoporosis Treatment" ;

 

 

This code clears the titles created in the DISTRIB macro and resets the title for the example.

data _dist;

set & outlib..d escript_&foodtype._w0304_0;

keep  tpercentile1-tpercentile99;

run;

The dataset descript_&foodtype_w0304_0 is from the DISTRIB macro.  The percentiles are saved from this dataset.

data _lambda (keep=lambda sse);
set _dist;
  array Z  (*) Z1-Z99;
  array T  (*) T1-T99;
  array CP (*) CP1-CP99;
  array P  (*) tpercentile1-tpercentile99;

  do i= 1 to dim(z);
    Z(i) = probit((i - 0.375 ) / 99.25 );
    end;

  MZ  = mean(of Z1-Z99);
  SZZ = CSS(of Z1-Z99);
  best_SSE    = SZZ;
  best_lambda = 0 ;
  do lambda = 0 to 1 by 0.01 ;
    do i = 1 to 99 ;
     if (lambda = 0 ) then T(i) = log(P(i));
     else T(i) = (P(i)**lambda - 1 ) / lambda;
      end;

    MT  = mean(of T1-T99);
    STT = CSS(of T1-T99);
    do i = 1 to 99 ;
      CP(i) = (T(i) - MT) * (Z(i) - MZ);
      end;

    STZ = sum(of CP1-CP99);
    SSE = SZZ - ((STZ** 2 )/STT);
    if (SSE < best_SSE) then do;
      best_SSE     = SSE;
      best_lambda  = lambda;
      end;
    end;

  lambda = best_lambda;
  sse    = best_sse;
  call symput( 'lamt' ,lambda);
run;

This code finds the best lambda to transform the estimate of T. This is done by finding the lambda that minimizes the error sum of squares.  Z is the inverse of the normal cumulative distribution function (probit).  At the end of the code, the best lambda is saved as a macro variable called lamt.

proc univariate data=&data noprint;
  where &response> 0 ;
  var &response;
  output out=outmin_amt min=min_amt;
run;

The minimum amount consumed on a consumption day is calculated and saved in a dataset called outmin_amt.

data paramsubj1rec;
  if _n_=
1 then set outmin_amt;
  set parsubj1rec;
  lamt = &lamt;
  array zero(
2 ) &var12;
  do i=
1 to 2 ;
  if zero[i]=
0 then zero[i]=(min_amt/ 2 );
  end;
run;

proc sort; by &subject; run;

The minimum amount is added to the dataset.  Zero values are set to ½ of the mimimum amount.  The dataset is sorted by subject.  The variable lamt is created from the macro value lamt that was created previously.

%indivint(model12= 2 , subj1recdata=paramsubj1rec, recid=&subject, r24vars=&var12, min_amt=min_amt, var_u1=p_var_u1, var_u2=a_var_u2, cov_u1u2=cov_u1u2, var_e=a_var_e, lambda=a_lambda, xbeta1=x1b1, xbeta2=x2b2, boxcox_t_lamt=y, lamt=lamt, dencalc=y, denopt=y, u1nlmix=, u2nlmix=, titles=&titles, notesprt=y);

The INDIVINT macro is called.  Because we are fitting a two-part model, the macro variable model12 is set equal to 2. This macro uses the dataset with 1 record per person (paramsubj1rec), and specifies the two 24-hour recall variables (var12).  The minimum amount that was saved in the paramsubj1rec dataset is specified for the min_amt macro variable, as are the parameters that are saved in that dataset, including p_var_u1, a_var_u2, cov_u1u2, a_var_e, and a_lambda. The predicted values x1b1 and x2b2 from the MIXTRAN run are specified. The macro variable boxcox_t_lamt is set to y for ‘Yes’, indicating that a Box-Cox parameter is specified.  This parameter is lamt, which was defined in the paramsubj1rec dataset in the last step.  The macro variables dencalc, denopt, and notesprt are set to y for ‘Yes’, indicating that denominator integration is perfomed, denominator optimization is performed, and notes are printed in the SAS log, respectively.

data subj1recres;
  merge &datam _resdata(keep = &subject indusint);
  by &subject;
run; 

The dataset _resdata is created in the INDIVINT macro.  It is merged with the dataset adultf.

proc reg data=subj1recres outest=regout;
  freq w0304_0;
  model &regresponse = indusint &covars_reg;
title %eval (&titles+ 1 ) "Regression Model:  Adjusted for Measurement Error" ;

The regression model is run using the expected usual intake from the INDIVINT macro, called indusint.  Because boxcox_t_lamt was set to y, this variable was already transformed in the INDIVINT macro, using the value of lamt in a Box-Cox transformation. Title3 is reset.

data diethealthout0;
  set regout(rename = (indusint=rcbeta));
  lamt = &lamt;
  t0boxcox = (&t0**lamt - 1 )/lamt;
  t1boxcox = (&t1**lamt - 1 )/lamt;
  tdiffrcbeta = (t1boxcox - t0boxcox)*rcbeta;
run; 

The output dataset from the regression above (regout) is set, and the parameter for the expected true usual intake is renamed to rcbeta.  In addition, lamt is saved, and the variables t0boxcox and t1boxcox and their difference multipled by rcbeta are computed.  The macro variables t0 and t1 are set to a range that is relevant to the data, e.g., the 10th and 90th percentile.

proc datasets nolist; delete parampred parsubj1rec paramsubj1rec subj1recres _resdata  regout; run;

Unneeded datasets are deleted.

%do run= 1 %to 16 ;

This code starts a loop to run the 16 BRR runs.

%put ~~~~~~~~~~~~~~~~~~~ Run &run ~~~~~~~~~~~~~~~~~~~~;

The run number is printed to the log.

%MIXTRAN (data=&data, response=&response, foodtype= &foodtype, subject=&subject, repeat=&repeat,     covars_prob=&covars_prob, covars_amt=&covars_amt, outlib=&outlib, modeltype=&modeltype, lambda=&lambda, replicate_var=w0304_&run, seq=&seq, weekend=&weekend, vargroup=&vargroup, numvargroups= &numvargroups, subgroup=&subgroup,  start_val1=&start_val1, start_val2=&start_val2, start_val3=&start_val3, vcontrol=&vcontrol, nloptions=&nloptions, titles= &titles, printlevel=&printlevel)

Within the BRR212 macro, the MIXTRAN macro is called.  All of the variables preceded by & will be defined by the BRR212 macro call.  The only variable without an & is the replicate_var macro variable; it is set to w0304_&run where &run equals 1 to 16.

data parampred(keep = &subject a_var_u2 a_var_e a_lambda x2b2);

if (_n_ = 1 ) then set & outlib.._ param_unc_&foodtype;  set & outlib.._ pred_unc_&foodtype;

run;

Input datasets for the INDIVINT macro are created.  The datasets _pred_unc_&foodtype and _param_unc_&foodtype are created in the MIXTRAN run.  The syntax “if (_n_=1) then set” is a way of doing a 1-to-many merge, where the dataset _param_unc_&foodtype is merged with each row of the _pred_unc_&foodtype dataset.

data parsubj1rec;

  merge parampred &datam(keep = &subject &var12);

  by &subject;

run;

The INDIVINT macro requires a dataset with only one line per person called parsubj1rec.  This is defined by the macro variable datam in the BRR212 macro, and the response variables are defined by the macro variable var12.

proc univariate data=&data noprint;
  where &response> 0 ;
  var &response;
  output out=outmin_amt min=min_amt;
run;

The minimum amount consumed on a consumption day is calculated and saved in a dataset called outmin_amt.

data paramsubj1rec;
  if _n_= 1 then set outmin_amt;
  set parsubj1rec;
  array zero( 2 ) &var12;
  do i= 1 to 2 ;
  if zero[i]= 0 then zero[i]=(min_amt/ 2 );
  end;
  lamt = &lamt;
run;

The minimum amount is added to the dataset.  Zero values are set to ½ of the mimimum amount.  The dataset is sorted by subject.  The variable lamt is created from the macro value lamt that was created previously.

%indivint(model12= 2 , subj1recdata=paramsubj1rec, recid=&subject, r24vars=&var12, min_amt=min_amt, var_u1=, var_u2=a_var_u2, cov_u1u2=, var_e=a_var_e, lambda=a_lambda, xbeta1=, xbeta2=x2b2, boxcox_t_lamt=y, lamt=lamt, dencalc=y, denopt=y, u1nlmix=, u2nlmix=, titles=&titles, notesprt=y);

The INDIVINT macro is called.  Because we are fitting a one-part model, the macro variable model12 is set equal to 2. This macro uses the dataset with 1 record per person (paramsubj1rec), and specifies the two 24-hour recall variables (var12).  The minimum amount that was saved in the paramsubj1rec dataset is specified for the min_amt macro variable, as are the parameters that are saved in that dataset, including p_var_u1, a_var_u2, cov_u1u2, a_var_e, and a_lambda. The predicted values x1b1and x2b2 from the MIXTRAN run are specified. The macro variable boxcox_t_lamt is set to y for ‘Yes’, indicating that a Box-Cox parameter is specified.  This parameter is lamt, which was defined in the paramsubj1rec dataset in the last step.  The macro variables dencalc, denopt, and notesprt are set to y for ‘Yes’, indicating that denominator integration is perfomed, denominator optimization is performed, and notes are printed in the SAS log, respectively.

data subj1recres;
  merge &datam _resdata(keep = &subject indusint);
  by &subject;
run; 

The dataset _resdata is created in the INDIVINT macro.  It is merged with the dataset adultf.

proc reg data=subj1recres outest=regout;
  freq w0304_&run;
  model &regresponse = indusint &covars_reg;
  title %eval (&titles+ 1 ) "Regression Model:  Adjusted for Measurement Error" ;

The regression model is run using the expected usual intake from the INDIVINT macro, called indusint.  Because boxcox_t_lamt was set to y, this variable was already transformed in the INDIVINT macro, using the value of lamt in a Box-Cox transformation. Title3 is reset.

data diethealthout;
  set regout(rename = (indusint=rcbeta));
  lamt = &lamt;
  t0boxcox = (&t0**lamt - 1 )/lamt;
  t1boxcox = (&t1**lamt - 1 )/lamt;
  tdiffrcbeta = (t1boxcox - t0boxcox)*rcbeta;
run; 

The output dataset from the regression above (regout) is set, and the parameter for the expected true usual intake is renamed to rcbeta.  In addition, lamt is saved, and the variables t0boxcox and t1boxcox and their difference multipled by rcbeta are computed.  The macro variables t0 and t1 are set to a range that is relevant to the data, e.g., the 10th and 90th percentile.

proc append base=brr_runs data=diethealthout;
run;

The BRR datasets are appended into a dataset called brr_runs.

proc datasets nolist; delete parampred parsubj1rec paramsubj1rec subj1recres _resdata  regout diethealthout; run;

After appending the information to brr_runs, diethealthout and the other datasets created in the BRR runs can be deleted.

%end ;

The BRR runs end.

data _null_;
format list listd listsum $255. ;
%let I=1;
%let varamtu= %upcase (&covars_reg);
%do %until ( %qscan (&varamtu,&I, %str ( ))= %str ());
  %let varb&I= %qscan (&varamtu,&I, %str ( ));
      num= %eval (&i+ 1 );
      varA= strip(
"&&varb&i." );
      list =  trim(list)||
' ' || 'b' ||trim(varA);
      listd =  trim(listd)|| ' ' || 'db' ||trim(varA);
      listsum =  trim(listsum)|| ' ' || 'sum_db' ||trim(varA);
%let I= %eval (&I+1);
%end ;
%let cnt= %eval (&I-1);
%if &covars_reg= %str () %then %let cnt=0;
call symput( 'list' ,list);
call symput( 'listd' ,listd);
call symput( 'listsum' ,listsum);
run;

This code creates macro calls that are a list of the variables with the prefix ‘b’ and ‘db’.

data parambrr;
set brr_runs;
rename rcbeta=brcbeta tdiffrcbeta=btdiffrcbeta intercept=bintercept;
array covs (&cnt) &covars_reg;
array newname (&cnt) &list;
do i= 1 to &cnt;
  newname[i]=covs[i];
 end;
data parambrr;
set parambrr;
keep bintercept brcbeta btdiffrcbeta &list run;
run;

The brr_runs dataset is set, and a new dataset is created.  This dataset renames the BRR parameters to have a prefix of ‘b’ using a rename statement and the macro calls created above.

data ss;
  if _n_= 1 then set diethealthout0;
  set parambrr;
  array bvar (*) bintercept brcbeta btdiffrcbeta &list;
  array varo  (*)intercept rcbeta tdiffrcbeta &covars_reg;
  array dsqr (*) dbintercept dbrcbeta dbtdiffrcbeta &listd;
  do i= 1 to dim(bvar);
   dsqr[i]=(bvar[i]-varo[i])** 2 ;
  end;
run;

The estimates from the first run are merged with the BRR estimates, and the squared difference is computed for each BRR run.

proc means data=ss sum;
 var  dbintercept dbrcbeta dbtdiffrcbeta &listd;
 output out=sums sum=sum_dbintercept sum_dbrcbeta  sum_dbtdiffrcbeta &listsum;
run;

The sum of squares is computed.

data brr;
 set sums;
 array sumt (*) sum_dbintercept sum_dbrcbeta  sum_dbtdiffrcbeta &listsum;
 array se  (*) intercept rcbeta tdiffrcbeta &covars_reg;
 do j= 1 to dim(sumt);
 se[j]=sqrt((sumt[j])/( 16 * .49 ));
 end;
 keep intercept rcbeta tdiffrcbeta &covars_reg;
run;

The standard errors are computed.

data wparms;
 set diethealthout0 brr;
keep intercept rcbeta tdiffrcbeta &covars_reg;

The point estimates are merged with the standard errors.

proc transpose data=wparms out=wparmst;
run;

The data are transposed.

data &final;
  format pvalue 6.4 ;
  set wparmst(rename=(col1=estimate col2=brrse));
  waldchisq = (estimate/brrse)** 2 ;
  pvalue =
1 - probchi(waldchisq, 1 );
  lower95lim = estimate -
1.96 *brrse;
  upper95lim = estimate +
1.96 *brrse;
 
drop _label_;
run; 

The final dataset is created, and p-values and 95% confidence intervals are computed.

proc print;
run;

The final dataset is printed.

%mend BRR212;

The end of the BRR212 macro is indicated.

 

Step 4: Call the BRR212 macro

 

Statements Explanation

%BRR212(data=milk, response=d_milk, datam=adultf, var12=d_milk_d1 d_milk_d2, day=, regresponse=treatosteo, foodtype=Milk, lambda= 0.01, subject=seqn, repeat=day, covars_prob=ridageyr, covars_amt=ridageyr, covars_reg=ridageyr, outlib=work, modeltype=corr, titles= 2, printlevel= 2, nsim_mc= 100, t0= 0.5, t1= 1.5, nloptions=technique=trureg  maxfunc= 10000 maxiter= 200, final=nh.m21task3)  

This code calls the BRR212 macro.  The dataset milk defined in Step 1 is used; the macro variable response for which we want to model the distribution is d_milk.  We also have to define a dataset with one line per person, datam, which is set equal to the dataset adultf, calculated in Step 1.  The response variables for this dataset are defined by macro variable var12 and equal d_milk_d1  and d_milk_d2.   The dependent variable for the regression is treatosteo.

The macro variable foodtype is used to label the datasets from the MIXTRAN and DISTRIB macros.  The variable seqn identifies the subject, and the macro variable repeat defines the variable that identifies the repeats on the subject, which is day.  The covars_prob and covars_amt macro variables need to include the covariates for the usual intake model, which is defined by ridageyr. The same covariates must be put in the macro variable covars_reg; however, there may be additional variables in covars_prob and covars_amt that are not in covars_reg.

 Note that a numeric id is required for “subject” in the INDIVINT macro.

The macro variable outlib specifies the library where the data are to be stored.  In this case, the working directory, work, was used. Because this is an episodically-consumed dietary constituent, modeltype= corr  is specified.  This fits the two-part model with correlated random effects.

Nloptions are specified, as the last BRR run had a floating point error when the default options were used.

The macro variable titles saves 2 lines for titles supplied by the user.  The printlevel is 2, which prints the output from the NLMIXED runs and the summary.

The macro variable nsim_mc is used to specify the number of pseudo-individuals for which the distribution is simulated per respondent.

The macro variables t0 and t1 must be specified.  In this case, the approximate 10th and 90th percentiles of the distribution were used.

The variable final specifies the name of the final dataset produced.

Info iconIMPORTANT NOTE

Although the macro variable weekend is defined in the macro, this macro currently is not capable of adjusting for weekends as described earlier in the course (see Module 18, Task 3 for more information) as the DISTRIB macro does.

 

Info iconWARNING

Note that SAS 9.2 TS1MO will produce an error with missing data in PROC IML, which is used in the INDIVINT macro.  The problem is fixed in SAS 9.2 TS2M2.  The SAS Problem Note documents the problem.  From the HOT FIX tab a link to download the hot fix is available.  This error is not encountered in SAS 9.1.3.

 

Step 5: Interpret output

Info iconIMPORTANT NOTE

Note: Your results may vary slightly, as a random seed is used to estimate the distribution of usual intake. However, they would not be expected to vary by more than 1%.

 

 

close window icon Close Window to return to module page.