In this example, you will assess the association between high density lipoprotein (HDL) cholesterol — the outcome variable — and body mass index (bmxbmi) — the exposure variable — after controlling for selected covariates in NHANES 1999-2002. These covariates include gender (riagendr), race/ethnicity (ridreth1), age (ridageyr), smoking (smoker, derived from SMQ020 and SMQ040; smoker =1 if non-smoker, 2 if past smoker and 3 if current smoker) and education (dmdeduc).
There are several things you should be aware of while analyzing NHANES data with Stata. Please see the Stata Tips page to review them before continuing.
Remember that you need to define the SVYSET before using the SVY series of commands. The general format of this command is below:
svyset [w=weightvar], psu(psuvar) strata(stratavar) vce(linearized)
To define the survey design variables for your high density lipoprotein cholesterol analysis, use the weight variable for four-years of MEC data (wtmec4yr), the PSU variable (sdmvpsu), and strata variable (sdmvstra) .The vce option specifies the method for calculating the variance and the default is "linearized" which is Taylor linearization. Here is the svyset command for four years of MEC data:
svyset [w= wtmec4yr], psu(sdmvpsu) strata(sdmvstra) vce(linearized)
For continuous variables, you have a choice of using the variable in its original form (continuous) or changing it into a categorical variable (e.g. based on standard cutoffs, quartiles or common practice). The categorical variables should reflect the underlying distribution of the continuous variable and not create categories where there are only a few observations.
It is important to exam the data both ways, since the assumption that a dependent variable has a continuous relationship with the outcome may not be true. Looking at the categorical version of the variable will help you to know whether this assumption is true.
In this example, you could look at BMI as a continuous variable or convert it into a categorical variable based on standard BMI definitions of underweight, normal weight, overweight and obese. Here is how categorical BMI variables are created:
Code to generate categorical BMI variables | BMI Category |
---|---|
gen bmicat=1 if bmxbmi>=0 & bmxbmi<18.5 |
underweight |
replace bmicat=2 if bmxbmi>=18.5 & bmxbmi<25 |
normal weight |
replace bmicat=3 if bmxbmi>=25 |
overweight |
replace bmicat=4 if bmxbmi>=30 & bmxbmi<. |
obese |
For all categorical variables, you need to decide which category to use as the reference group. If you do not specify the reference group options, Stata will choose the lowest numbered group by default.
Use the following general command to specify the reference group:
char var[omit]reference group value
For these analyses, use the following commands to specify the following reference groups.
Stata command | Reference group |
---|---|
char ridreth1[omit]3 |