The first task is to identify missing data and recode it. Here are the steps:
In this step, you will use the tabstat and nmissing commands to check for missing, minimum and maximum values of continuous variables, and the tabulate command to look at the frequency distribution of categorical variables in your master analytic dataset. The output from these commands provides the number and frequency of missing values for each variable listed in the procedure statement.
Typically the commands, tabstat or summarize are used for continuous variables, and tabulate is used for categorical variables. In the following example, tabstat and tabulate commands are provided on the same set of variables without distinguishing continuous and categorical variables. If you use the tabulate command on a continuous variable with many values, the output could be extensive.
Use the tabstat and nmissing commands to determine the minimum values (min), and maximum values (max), and the number of missing observations for the selected variables for participants who were interviewed and examined in the MEC and who were age 20 years and older.
The nmissing command can be installed from http://www.stata-journal.com/software/sj5-4/dm67_3/.
tabstat bpq* mcq* if (ridageyr >=20 &
ridageyr <.) & ridstatr==2, stat(n min max)
nmissing bpq* mcq* if (ridageyr >=20 &
ridageyr <.) & ridstatr==2
Use the tabulate command to determine the frequency of each value of the variables listed for participants who were interviewed and examined in the MEC and who were age 20 years and older. Use the missing option to display the missing values.
tabulate bpq010 if (ridageyr >=20 & ridageyr <.) & ridstatr==2, missing
Highlighted items from the commands tabstat, nmissing and tabulate output:
Two options can be used to recode the missing data:
Use the if qualifier to recode "7" and "9" values of a variable as missing.
replace bpq010=. if bpq010==7 | bpq010==9
Use the foreach loop command to recode "7" and "9" values of a variable as missing.
Use this option when you want to recode multiple variables that use the same numeric value for "refused" and "don't know". Use the save command to create a new dataset with the recoded values.
foreach i in bpq020 bpq050a bpq100d bpq070
bpq080 mcq160b mcq160c mcq160d mcq160e mcq160f {
replace `i' =. if `i' >=7
save C:\Nhanes\Data\demo_bp1, replace
In this step you will use the tabulate command to ensure that the recoding done in the previous step was done correctly. As a general rule, if 10% or less of your data for a variable are missing from your analytic dataset, it is usually acceptable to continue your analysis without further evaluation or adjustment. However, if more than 10% of the data for a variable are missing, you may need to determine whether the missing values are distributed equally across socio-demographic characteristics, and decide whether further imputation of missing values or use of adjusted weights are necessary. (Please see Analytic Guidelines for more information.)
Use the tabulate command to determine the frequency of each value of the variables listed for participants who were interviewed and examined in the MEC and who were age 20 years and older. Use the missing option to display the missing values. Use the foreach loop command to get the frequency of multiple variables.
tabulate bpq010 if (ridageyr >=20 &
ridageyr <.) & ridstatr==2, missing
foreach i in bpq020 bpq070 bpq080 mcq160b
mcq160c mcq160d mcq160e mcq160f {
tabulate `i' if (ridageyr >=20 &
ridageyr <.) & ridstatr==2, missing
}
Highlighted items from the tabulate output for recoding missing values: