Task 3: How to Merge and Append NHANES Data for PAQ Analyses
To merge and append NHANES data for the PAQ analysis, you will need to:
Sort Data Files by a Unique Identifier.
The first step in merging data is to sort each data file by a unique identifier. Each study participant is assigned a unique identifier, represented by the variable SEQN. Use the PROC SORT procedure to sort the DEMO and PAQ files by the SEQN variable. If you have just downloaded the data from NCHS, the files are already sorted by SEQN and you can comment out the proc sort steps. However, the code is included in case any changes have been made to the files before merging. In this segment of code, we use the “out” statement to store the sorted dataset to the SAS temporary library titled ‘WORK’. You can explore the WORK library by accessing the SAS Explorer and navigating to “Libraries”.
Sample Code
data = demo_c.demo_c out = demo_c;
by seqn;
run;
proc sort data = paq_c.paq_c out = paq_c;
by seqn;
run;
proc sort data = demo_d.demo_d out = demo_d;
by seqn;
;
data = paq_d.paq_d out = paq_d;
by seqn;
;
Merge Data Files by the Unique Identifier
Merging, as well as sorting, is done using a unique identifier. Use the SEQN variable to merge the demographic (DEMO) and physical activity questionnaire (PAQ) data. We have created two intermediate datasets for this task: paq3 and paq5. The paq3 intermediate dataset merges the DEMO_C and PAQ_C files from the NHANES 2003-2004 cycle. Similarly, the paq5 intermediate dataset merges the DEMO_D and PAQ_D from the 2005-2006 NHANES cycle.
After you have merged the data files, check the contents of your intermediate datasets to make sure the files merged correctly. Use the PROC CONTENTS procedure to list all variable names and labels. You can also use the PROC MEANS procedure to check the number of observations for each variable as well as missing, minimum, and maximum values.
Sample Code
merge demo_c.demo_c paq_c.paq_c;
by seqn;
;
data = paq3;
;
merge demo_d.demo_d paq_d.paq_d;
by seqn;
;
data = paq5;
;
Append Data for Multiple NHANES Cycles
Before appending data from two or more cycles, examine the contents of the data file to identify variables whose names may have changed between cycles.
- If the names or labels of the variables of interest are identical in the selected cycles, you can append the data files directly.
- If the variables of interest have changed, you will need to evaluate the differences in the wording of the question, definitions, and response choices that were used during data collection. You may need to recode the variables before the files can be appended. Notably, the NHANES PAQ data do not include any significant changes in the variable names or labels for data collected during each 2-year cycle for the continuous NHANES (1999-2000 to 2005-2006). Thus, we will perform a direct append of the data from the 2003-2004 and 2005-2006 NHANES cycles. Code for the resulting intermediate dataset titled “paq” is included below.
Sample Code
set paq3 paq5;
;
Construct new sample weights
When you combine two or more 2-year cycles of the continuous NHANES for NHANES 2001-2002 and beyond, you must construct sample weights before beginning any analyses. When survey cycles are combined, the estimates will be representative of the population at the midpoint of the combined survey period.
For the 4 years of PAQ data from 2003-2006 a weight should be constructed as:
Newly Constructed Weight = 1/2 * WTINT2YR = WTINT4CD
For the PAQMSTR.sas dataset, we name the newly constructed weight for the 2003-2006 PAQ data “WTINT4CD”.