NHANES II Web Tutorial: Keep & Merge Datasets: Task 2

Task 2: How to Merge NHANES II Data

Here are the steps to merge NHANES II data:

Step 1 Sort data files by SEQN

The first step in merging data is to sort each of the data files by a unique identifier. In NHANES II data, this unique identifier is known as the sequence number (SEQN). NHANES II uses SEQN to identify each sample person, so SEQN is the variable you must use to merge data files. To ensure that all observations are ordered in the same way in each data file, you need to sort each data file by the SEQN variable. Use the proc sort procedure in SAS to sort the data.

Step 2 Merge data by SEQN

After sorting the data files, you can continue merging the data using the merge statement. Remember that you will always merge on a unique identifier — in this case, the SEQN variable. This is demonstrated in the table below.

Step 3 Check results

Program to merge data and check contents of data
Statements	Explanation
proc sort data =LAB; by SEQN;	Use the proc sort procedure and by statement to sort the laboratory data by SEQN.
proc sort data =MDEXAM; by SEQN;	Use the proc sort procedure and by statement to sort the examination data by SEQN.
proc sort data =ADULT; by SEQN;	Use the proc sort procedure and by statement to sort the adult questionnaire data by SEQN.
proc sort data =YOUTH; by SEQN;	Use the proc sort procedure and by statement to sort the youth questionnaire data by SEQN.
proc sort data =ANTHRO; by SEQN;	Use the proc sort procedure and by statement to sort the anthropometric questionnaire data by SEQN.
proc sort data =SUPPL; by SEQN;	Use the proc sort procedure and by statement to sort the supplemental health questionnaire data by SEQN.
data DEMO1_NH2;	Use the data step to name the new dataset that will contain the merged files (DEMO1_NH2).
merge LAB MDEXAM ADULT YOUTH ANTHRO SUPPL; by SEQN;	Use the merge statement to merge data from the six data files by their linking variable - SEQN.
proc contents data =DEMO1_NH2 varnum ;	Use the proc contents procedure to list contents of the DEMO1_NH2 data — the new merged dataset. Use the varnum option to order the listed variables according to their positions in the dataset.
proc means data =DEMO1_NH2 N Nmiss min max maxdec = 2 ;	Use the proc means procedure to show the mean, number of missing values, minimum and maximum values for the variables in the merged dataset (DEMO1_NH2).

Highlighted results of this program:

This dataset contains 25,286 observations. No new observations have been added to the dataset.
Using the variable list provided by the proc contents procedure, check that the variables in your original datasets are in the merged dataset — in this case, they all are.
The N column has the number of observations with valid data for each variable in the dataset.
The N miss column has the number of missing observations for each variable in the dataset.
The minimum column lists the minimum value for each variable in the dataset.
The maximum column lists the maximum value for each variable in the dataset.

Close Window