Continuous NHANES Web Tutorial : Linear Regression: Task 2c

In this example, you will assess the association between high density lipoprotein (HDL) cholesterol — the outcome variable — and body mass index (bmxbmi) — the exposure variable — after controlling for selected covariates in NHANES 1999-2002. These covariates include gender (riagendr), race/ethnicity (ridreth1), age (ridageyr), smoking (smoker, derived from SMQ020 and SMQ040; smoker =1 if non-smoker, 2 if past smoker and 3 if current smoker) and education (dmdeduc).

Step 1: Use svyset to define survey design variables

Remember that you need to define the SVYSET before using the SVY series of commands. The general format of this command is below:

To define the survey design variables for your high density lipoprotein cholesterol analysis, use the weight variable for four-years of MEC data (wtmec4yr), the PSU variable (sdmvpsu), and strata variable (sdmvstra) .The vce option specifies the method for calculating the variance and the default is "linearized" which is Taylor linearization. Here is the svyset command for four years of MEC data:

Step 2: Determine how to specify variables in the model

For continuous variables, you have a choice of using the variable in its original form (continuous) or changing it into a categorical variable (e.g. based on standard cutoffs, quartiles or common practice). The categorical variables should reflect the underlying distribution of the continuous variable and not create categories where there are only a few observations.

It is important to exam the data both ways, since the assumption that a dependent variable has a continuous relationship with the outcome may not be true. Looking at the categorical version of the variable will help you to know whether this assumption is true.

In this example, you could look at BMI as a continuous variable or convert it into a categorical variable based on standard BMI definitions of underweight, normal weight, overweight and obese. Here is how categorical BMI variables are created:

Table of code to generate categorical BMI variable
Code to generate categorical BMI variables	BMI Category
gen bmicat=1 if bmxbmi>=0 & bmxbmi<18.5	underweight
replace bmicat=2 if bmxbmi>=18.5 & bmxbmi<25	normal weight
replace bmicat=3 if bmxbmi>=25	overweight
replace bmicat=4 if bmxbmi>=30 & bmxbmi<.	obese

Step 3: Determine the reference group for categorical variables

For all categorical variables, you need to decide which category to use as the reference group. If you do not specify the reference group options, Stata will choose the lowest numbered group by default.

For these analyses, use the following commands to specify the following reference groups.

Task 2c: How to Use Stata Code to Perform Linear Regression

Step 1: Use svyset to define survey design variables

Step 2: Determine how to specify variables in the model

Step 3: Determine the reference group for categorical variables