In this module, you will use simple logistic regression to analyze NHANES data to assess the association between gender (riagendr) — the exposure or independent variable — and the likelihood of having hypertension (based on bpxsar, bpxdar) — the outcome or dependent variable, among participants 20 years old and older. You will then use multiple logistic regression to assess the relationship after controlling for selected covariates. The covariates include gender (riagendr), age (ridageyr), cholesterol (lbxtc), body mass index (bmxbmi) and fasting triglycerides (lbxtr).
For continuous variables, you have a choice of using the variable in its original form (continuous) or changing it into a categorical variable (e.g. based on standard cutoffs, quartiles or common practice). The categorical variables should reflect the underlying distribution of the continuous variable and not create categories where there are only a few observations.
For the dependent variable, you will create a dichotomous variable, hyper, which defines people as having (or not having) hypertension. Specifically, a person is said to have hypertension if their systolic blood pressure (measured in the MEC) exceeds 140 or their diastolic blood pressure exceeds 90 or if they are taking blood pressure medication. Remember for logistic regression to work in SUDAAN, this variable needs to be defined as 0 (meaning outcome did not occur, here person does not have hypertension) or 1 (outcome occurs, here person has hypertension). The code to create this variable is below:
if (bpxsar >= 140 or bpxdar >= 90 or bpq050a = 1 ) then Hyper = 1 ;
else if (bpxsar ne . and bpxdar ne . ) then Hyper = 0 ;
In addition to creating the dependent dichotomous variable, this example will also create additional independent categorical variables (age, hichol, bmigrp) from the age, cholesterol, and BMI categorical variables to use in this analysis.
Independent variable | Code to generate independent categorical variables |
---|---|
Age |
if 20 <=ridageyr< 40 then 1 ; |
High cholesterol | if (lbxtc>= 240 or bpq100d = 1 ) then HiChol = 1 ; |
BMI category |
if 0 <=bmxbmi< 25 then 1 ; |
Because the triglycerides variable (lbxtr) is highly skewed, you will use a log transformation to create new variable to use in this analysis.
logtrig=log(lbxtr);
Because not every participant in NHANES responded to every question asked, there may be a different level of item non-response to each variable. To ensure that your analyses are done on the same number of respondents, create a variable called eligible which is 1 for individuals who have a non-blank value for each of the variables used in the analyses, and 0 otherwise. Although this is a univariate analysis using only exam variables, the fasting subsample weight (wtsaf4yr) is included in determining the eligible variable. This is because you will be conducting a multivariate analysis using the triglycerides variable later and will limit the sample to persons included in both analyses. The SAS code defining eligible is:
. and bmigrp ne . and age ne . and logtrig ne . and wtsaf4yr ne 0 then eligible=1 ;if hyper ne . and hichol ne