NHANES Environmental Chemical Data Tutorial - Descriptive Statistics

Task 2: How to Generate Percentiles in SUDAAN

In this example, you will use SAS-callable SUDAAN to generate percentiles and standard errors for total Mono-(2-ethyl)-hexyl phthalate by age, gender, race-ethnicity and survey cycle.

IMPORTANT NOTE

There are other methods for estimating percentiles using SAS. For example see Appendix A page 503 of the National Report on Human Exposure to Environmental Chemicals.

Step 1: Use proc descript to generate percentiles in SUDAAN

To calculate the percentiles and standard errors, you will use SAS-callable SUDAAN because this software takes into account the complex survey design of NHANES data when determining variance estimates. The SUDAAN procedure proc descript is used to generate percentiles and standard errors. These estimates are requested on the print statement along with the sample size (nsum). The general program for obtaining weighted percentiles and standard errors is below.

WARNING

The design variables, sdmvstra and sdmvpsu, are provided in the demographic data files and are used to calculate variance estimates.

Generate Percentiles in SUDAAN
Statements	Explanation
proc descript data=nh.Phthalate_analysis_data design=WR atlevel1= design=WR notsorted;	Use the proc descript procedure to generate percentiles and specify the sample design using the design option WR (with replacement). The data statement refers to the permanent dataset, Phthalate_analysis_data, created in module 10. The option notsorted is used since you did not use the SAS procedure proc sort to sort the dataset by strata (sdmvstra) and PSU (sdmvpsu).
NEST sdmvstra sdmvpsu;	Use the nest statement with strata (sdmvstra) and PSU (sdmvpsu) to account for the design effects.
weight WTSPH6YR;	Use the weight statement to account for the unequal probability of sampling and non-response. In this example, the 6-year Phthalate Subsample weight (WTSP6YR) is used.
subgroup age5cat riagendr reth4cat sddsrvyr ;	Use the subgroup statement to list the categorical variables for which statistics are requested. This example uses 5 age categories (age5cat), gender (riagendr), race-ethnicity (reth4cat) and survey cycle (sddsrvyr). These variables also appear in the table statement.
levels 5 2 3 3 ;	Use the levels statement to define the number of categories in each of the subgroup variables. The level must be an integer greater than 0. This example uses five age categories, two genders, three race-ethnicity groups and three survey cycles.
var URXMHP;	Use the var statement to name the variable(s) to be analyzed. In this example, the Mono-(2-ethyl)-hexyl phthalate variable (URXMHP) is used.
percentile 50 75 90 95 ;	Use the percentile statement to request select percentiles.
table age5cat riagendr reth4cat sddsrvyr;	Use the table statement to specify tabulations for which estimates are requested. If a table statement is not present, a one—dimensional distribution is generated for each variable in the subgroup statement. In this example the estimates are for age categories (age5cat), gender (riagendr), race-ethnicity (reth4cat) and survey cycle (sddsrvyr).
PRINT nsum= "N" qtile= "Qtile" seqtile= "SE" style=nchs qtilefmt= F6.1 seqtilefmt= F6.1 ;	Use the print statement to assign names, format the statistics desired, and view the output. If the statement print is used alone, all of the default statistics are printed with default labels and formats. In this example, the sample size (nsum), quantile (qtile) abd the standard error of quantile (seqtile) are requested. Note: For a complete list of statistics that can be requested on the print statement see SUDAAN Users Manual. Use the style option equal to NCHS to produce output which parallels a table style used at NCHS.
rtitle "Selected percentiles of Mono-(2-ethyl)-hexyl phthalate" ;	Use the rtitle statement to assign a heading for each page of output.

Step 2: Review output

The output will list the sample sizes, percentiles and their standard errors.

Reviewing the output of the program, in 2003–2004, note that 50% of the sampled population has Mono-(2-ethyl)-hexyl phthalate value less than 1.9 ng/mL and 50% of the sampled population has a Mono-(2-ethyl)-hexyl phthalate value greater than the 1.9 ng/mL.

View animation of program and output

Close Window to return to module page.