In this example, you will be calculating means of dietary calcium intake. The mean and its standard error are obtained directly from the PROC DESCRIPT procedure in SUDAAN and then output into a SAS dataset where the confidence intervals can be constructed.
Before running any SUDAAN procedure, sort the data by strata and PSUs, using the PROC SORT procedure.
Use the PROC DESCRIPT procedure to generate means. Use the ATLEVEL1=1 and ATLEVEL2=2 options in the DATA statement to specify the sampling stages (in NHANES, the number of strata is level 1, and the number of PSUs is level 2) for which you want counts per table cell. ATLEV1 is the number of strata with at least one valid observation and ATLEV2 is the number of PSUs with at least one valid observation. These numbers are used to calculate degrees of freedom.
Use the NEST statement to account for the design effects of the survey and the WEIGHT statement to account for the unequal probability of sampling and non-response. Use the SUBPOPN statement to select the subpopulation of interest. Use a CLASS statement to list the discrete variables upon which subgroups are based and a VAR statement to list variables in the analysis. Use the TABLE statement to obtain results for each gender.
The PRINT statement allows you to print the number of observations (NSUM), means (MEAN), and standard error of the mean (SEMEAN). The OUTPUT statement outputs the number of observations (NSUM), means (MEAN), standard error of the mean (SEMEAN), number of strata (ATLEV1), and number of PSUs (ATLEV2) to a SAS file named CALC0304.
*-------------------------------------------------------------------------;
* Use the PROC SORT procedure to sort the data files by strata and
PSU. ;
* Data must always be sorted before running a SUDAAN
procedure. ;
*
;
* Use the PROC DESCRIPT procedure to estimate the mean dietary
calcium ;
* intake (DR1TCALC) by gender (RIAGENDR) in males and females ages
20 ;
* and
older.
These
statistics will be output
into a new dataset called ;
*
CALC0304
where the confidence intervals can be
constructed directly.
;
*-------------------------------------------------------------------------;
data =CALCMILK;
by
SDMVSTRA SDMVPSU;
;
1
atlevel2= 2 ;
nest SDMVSTRA SDMVPSU;
weight
WTDRD1;
subpopn RIDAGEYR >=
20 /name= "Adults
20 years of age and older" ;
class
RIAGENDR/nofreq;
var
DR1TCALC;
table
RIAGENDR;
rformat RIAGENDR
GENDER. ;
print nsum mean semean/style=nchs meanfmt=f6.0
semeanfmt= f6.1
;
output
nsum mean semean atlev1 atlev2/filename=CALC0304 replace;
;
Use a DATA statement to create a new dataset called NEWCALC0304. Calculate the degrees of freedom (DF) from the number of PSU (ATLEV2) minus the number of strata (ATLEV1). Use a drop statement to drop selected variables from the dataset. Use a series of statements to calculate the lower limit of the confidence interval (LL), upper limit of the confidence interval (UL), mean (MEAN), and width of the confidence intervals (CIWIDTH). Use the proc print procedure to output these data.
*-------------------------------------------------------------------------;
* Create a new dataset called NEWCALC0304 which is based on the
dataset ;
* created in the last SUDAAN procedure. Confidence intervals around
the ;
* means and standard errors will be calculated using this new
dataset. ;
*-------------------------------------------------------------------------;
set
CALC0304;
df=atlev2-atlev1;
drop
PROCNUM TABLENO VARIABLE _C1 ATLEV1 ATLEV2;
ll=round(mean+tinv(.025 ,df)*semean);
ul=round(mean+tinv(.975 ,df)*semean);
mean=round(mean);semean=round(semean,.1 );
ciwidth=ul-ll;
;
*-------------------------------------------------------------------------;
* Use the PROC PRINT procedure to output the confidence
intervals. ;
*-------------------------------------------------------------------------;
split = '/'
noobs ;
format
riagendr
sex.
nsum
7.0
mean
6.0
semean
6.1
df
2.0 ;
label
ll= 'Lower' / 'Limit'
ul= 'Upper' / 'limit'
df= 'Degrees' / 'of' / 'freedom'
ciwidth='Confidence' / 'interval' / 'width' ;
title1
'Mean of dietary calcium intake and 95 % Confidence interval' ;
title2
'of males and females ages 20 years and older' ;
;
Number of observations read : 9034 Weighted count :286222757
Number of observations skipped : 1088
(WEIGHT variable nonpositive)
Observations in subpopulation : 4448 Weighted count:205284669
Denominator degrees of freedom : 15
Variance Estimation Method: Taylor Series (WR)
For Subpopulation: Adults 20 years of age and older
by: Variable, Gender - Adjudicated.
-------------------------------------------------
Variable
Gender - Sample SE
Adjudicated Size Mean Mean
-------------------------------------------------
Calcium (mg)
Total 4448 880 16.7
Male 2135 998 21.8
Female 2313 771 15.3
-------------------------------------------------
Degrees Confidence
Gender - Sample SE of Lower Upper interval
Adjudicated Size Mean Mean freedom Limit limit width
0 4448 880 16.7 15 844 916 72
Male 2135 998 21.8 15 952 1045 93
Female 2313 771 15.3 15 738 803 65
Highlights from the output include: