Key Concepts about Identifying, Recoding, and Evaluating Missing Data
Missing values may distort the results of your analysis. You must evaluate the extent of missing data in your dataset to determine whether the data are useable without additional re-weighting for item non-response. As a general rule, if 10% or less of your data for a variable are missing from your analytic dataset, it is usually acceptable to continue your analysis without further evaluation or adjustment.
When you review the PAQ codebook, you should note that NHANES assigns missing values in the following way:
- a period (.) for numeric variables, or
- a blank for character variables
However, other types of data also are important to consider as unavailable for analysis. When a sample person refuses to answer a question, a " refused” response is assigned a value of either "7,” "77,” or "777,” depending on the number of digits in the variable value range. A “don't know” response is assigned a value of either “9,” “99,” or “999,” depending on the number of digits in the variable value range.
If you fail to identify “refused” or “don’t know” as types of missing data, and treat their assigned values as real values, you will get distorted results in your statistical analyses. Therefore, it is important to recode " refused” or “don’t know” responses as missing values (either as a period (.) for numeric variables or as a blank for character variables).
NHANES codes | Description | Action |
---|---|---|
. (period) | missing numeric value | None |
(blank) | missing character value | None |
7 or 77 or 777 |