In this module, you will use simple logistic regression to analyze NHANES data to assess the association between calcium supplement use (anycalsup) — the exposure or independent variable — and the likelihood of receiving treatment for osteoporosis (treatosteo) — the outcome or dependent variable, among participants ages 20 years old and older. You will then use multiple logistic regression to assess the relationship after controlling for selected covariates. The covariates include gender (riagendr), age (ridageyr), race/ethnicity (ridreth1), and body mass index (bmxbmi).
This example uses the demoadv dataset (download at Sample Code and Datasets). This dataset already contains a variable anycalsup that has a value of 1 for those who report calcium supplement use, and a value of 2 for those who do not. A participant was considered not to have any calcium supplement use if the daily average amount of calcium supplement use was zero; otherwise, a participant was considered a supplement user (see Supplement Code under Sample Code and Module 9, Task 4 for more information).
It is always important to check all the variables in the model, and use the weight of the smallest common denominator. In the example of univariate analysis, the 2-year MEC weight is used, because the osteoporosis variable is from the MEC examination. The demoadv dataset for this example only includes those with MEC weights (wtmec2yr>0).
This example will also illustrate the creation of additional independent categorical variables (age, bmigrp) from the age, and BMI categorical variables, and these new variables will be used in this analysis.
Independent variable | Code to generate independent categorical variables |
---|---|
Age | if 20 <=ridageyr<40 then age= 1 ; |
BMI category | if 0 <=bmxbmi<25 then bmigrp= 1 ; |
You should not use a where clause or by-group processing in order to analyze a subpopulation with the SAS Survey Procedures. Prior to SAS 9.2, to get an approximate domain (subpopulation) analysis when using proc surveylogistic, you would assign a near zero weight to observations that do not belong to your current domain. The reason that you cannot make the weight zero is that the procedure will exclude any observation with zero weight. In this example, you have a domain (subpopulation) where age is greater than or equal to 20 years, and if you specify in a data step:
if ridageyr GE 20 then newweight=wtmec2yr;
else newweight=1e-6;
you could then perform the logistic regression using the newweight variable as:
weight newweight;
The code above with the newweight variable is no longer necessary in SAS 9.2. The statement
weight newweight;
may be replaced with the statements
weight wtmec2yr;
domain sel;
where sel is defined as
if ridageyr GE 20 then sel= 1 ;
else sel= 2 ;
(Note that for this particular example, osteoporosis treatment is only collected for those ages 20 and over, so you will not notice a difference whether wtmec2yr or newweight is used. However, if a different age group or variable was used for the subpopulation, differences would be noted.)
Reference: SAS Technical Support
This step introduces you to the SAS procedure for logistic regression, proc surveylogistic. There is a summary table of the SAS program below.
These programs use variable formats listed in the sample program. You may need to format the variables in your dataset the same way to reproduce results presented in the tutorial.
Statements | Explanation |
---|---|
proc surveylogistic data =demoadv; |
Use the proc surveylogistic procedure to perform multiple logistic regression to assess the association between the outcome and multiple risk factors, including: age, gender, race/ethnicity, and body mass index. |
stratum sdmvstra; |
Use the stratum statement to specify strata to account for design effects of stratification. |
cluster sdmvpsu; |
Use the cluster statement to specify primary sampling unit (PSU) to account for design effects of clustering. |
weight newweight; |
Use the weight statement to account for the unequal probability of sampling and non-response. In this example, you use the new weight variable created in the data step. See Step 1. |
class age ( ref = '40-59' ) riagendr ( ref = 'Male' ) ridreth1 anycalsup ( ref = 'No supp use' ) bmigrp ( ref = '25<=BMI<30' )/ param =ref; |
Use the class statement to specify all categorical variables in the model. Use the param and ref options to choose your reference group for the categorical variables. |
model treatosteo =anycalsup riagendr age ridreth1 bmigrp; |
Use the model statement to specify the dependent variable and all independent variable(s) in your logistic regression model. |
format riagendr gender. age agegrp. ridreth1 race. anycalsup yesnos. bmigrp bmifmt. ; |
Use the format statement to read the SAS formats for all formatted variables. |
The SAS Survey Procedure, proc surveylogistic, produces the Wald statistic and its p value. It does not produce the Satterthwaite χ2 or the Satterthwaite F and the corresponding p values recommended for NHANES analyses. For this reason, it is recommended that you use proc rlogist in SUDAAN for logistic regression.
In this step, the SAS output is reviewed. The highlighted elements show that:
If you ran both the SAS Survey and SUDAAN programs (or reviewed the output provided on the Sample Code and Datasets page), you may have noticed slight differences in the output. These differences can be caused by missing data in any paired PSU or how each software program handles degrees of freedom.