Missing values may distort your analysis results. You must evaluate the extent of missing data in your dataset to determine whether the data are useable without additional re-weighting for item non-response. As a general rule, if 10% or less of your data for a variable are missing from your analytic dataset, it is usually acceptable to continue your analysis without further evaluation or adjustment. However, if more than 10% of the data for a variable are missing, you may need to determine whether the missing values are distributed equally across socio-demographic characteristics, and decide whether further imputation of missing values or use of adjusted weights are necessary. (Please see Analytic Guidelines for more information.)
When you review the codebooks of Continuous NHANES data, you should note that NHANES assigns missing values in the following way:
However, other types of data also are important to consider as unavailable for analysis. When a sample person refuses to answer a question, a " refused” response is assigned a value of either " 7,” " 77,” or " 777” depending on the number of digits in the variable value range. A " don't know” response is assigned a value of either " 9,” " 99,” or " 999,” which is also dependent on the number of digits in the variable value range.
If you fail to identify these other types of missing data, and treat the assigned values for " refused” or " don't know” as real values, you will get distorted results in your statistical analyses. Therefore, it is important to recode " refused” or " don't know” responses as missing values (either as a period (.) for numeric variables or as a blank for character variables).
NHANES codes | Description | Action |
---|---|---|
. (period) | missing numeric value | None. |
(blank) | missing character value | None. |
7 or 77 or 777 | "refused" response | Code as missing (period or blank). |
9 or 99 or 999 | "don't know" response | Code as missing (period or blank). |