NHANES Web Tutorial: Logistic Regression: Task 2c

In this module, you will use simple logistic regression to analyze NHANES data to assess the association between gender (riagendr) — the exposure or independent variable — and the likelihood of having hypertension (based on bpxsar, bpxdar) — the outcome or dependent variable, among participants 20 years old and older. You will then use multiple logistic regression to assess the relationship after controlling for selected covariates. The covariates include age (ridageyr), cholesterol (lbxtc), body mass index (bmxbmi) and fasting triglycerides (lbxtr).

Step 1: Use svyset to define survey design variables

Remember that you need to define the SVYSET before using the SVY series of commands. The general format of this command is below:

To define the survey design variables for your cholesterol analysis, use the weight variable for four-yours of MEC data (wtmec4yr), the PSU variable (sdmvpsu), and strata variable (sdmvstra) .The vce option specifies the method for calculating the variance and the default is "linearized" which is Taylor linearization. Here is the svyset command for fur years of MEC data:

Step 2: Create dependent dichotomous variable

For continuous variables, you have a choice of using the variable in its original form (continuous) or changing it into a categorical variable (e.g. based on standard cutoffs, quartiles or common practice). The categorical variables should reflect the underlying distribution of the continuous variable and not create categories where there are only a few observations.

For the dependent variable, you will create a dichotomous variable, hyper, which defines people as having (or not having) hypertension. Specifically, a person is said to have hypertension if their systolic blood pressure (measured in the MEC) exceeds 140 or their diastolic blood pressure exceeds 90 or if they are taking blood pressure medication. Remember for logistic regression to work in Stata, this variable needs to be defined as 0 (meaning outcome did not occur, here person does not have hypertension) or 1 (outcome occurs, here person has hypertension). The code to create this variable is below:

gen hyper=1 if (bpxsar>=140 & bpxsar<. | bpxdar>=90 & bpxdar<.) | bpq050a==1
replace hyper=0 if hyper !=1 & (bpxsar !=. & bpxdar !=.)

Step 3: Create independent categorical variables

In addition to creating the dichotomous dependent variable, this example will also create additional independent categorical variables (age, hichol, bmigrp) from the age, cholesterol, and BMI categorical variables to use in this analysis.

Code to generate independent categorical variables
Independent variable	Code to generate independent categorical variables
Age	gen age=1 if ridageyr >=20 & ridageyr <40 replace age=2 if ridageyr >=40 & ridageyr <60 replace age=3 if ridageyr >=60 abd ridageyr <.
High cholesterol	gen hichol =1 if lbxtc >=240 & lbxtc<. \| bpq100d==1 replace hichol =0 if hichol ~=1 & lbxtc !=.
BMI category	gen bmigrp=1 if bmxbmi<25 replace bmigrp=2 if bmxbmi>=25 & bmxbmi <30 replace bmigrp=3 if bmxbmi>=30 & bmxbmi <.

Step 4: Transform highly skewed variables

Because the triglycerides variable (lbxtr) is highly skewed, you will use a log transformation to create new variable to use in this analysis.

Step 5: Choose reference groups for categorical variables

For all categorical variables, you need to decide which category to use as the reference group. If you do not specify the reference group options, Stata will choose the lowest numbered group by default. You can use the following general command to tell Stata the reference group:

For your analyses, use the following commands to specify the following reference groups:

Cholesterol

Code to specify reference groups
Variable	Code to specify reference group	Reference group
Gender	char riagendr [omit] 2	Women
Age	char age [omit] 2	40-59 year olds
BMI	char bmigrp [omit] 2	overweight (bmi25-29)

Task 2c: How to Use Stata Code to Perform Logistic Regression

Step 1: Use svyset to define survey design variables

Step 2: Create dependent dichotomous variable

Step 3: Create independent categorical variables

Step 4: Transform highly skewed variables

Step 5: Choose reference groups for categorical variables