Task 3b: How to Perform Chi-Square Test Using SAS

In this task, you will use the chi-square test to determine whether age group and osteoporosis treatment status are independent of each other.

 

Step 1: Examine Relationship Between Two Categorical Variables

The PROC SURVEYFREQ procedure is used in SAS to examine the relationship between two categorical variables and obtain chi-square statistics.  Use the STRATA statement to specify the strata variable to account for the design effects of stratification.  Use the CLUSTER statement to specify PSU to account for design effects of clustering.  Use the WEIGHT statement to account for the unequal probability of sampling and non-response.  Use the WHERE statement to specify the subpopulation of interest.

Use the TABLE statement to create a cross tab of the categorical variables age group (AGEGRP) and osteoporosis treatment status (TREATOSTEO).  The options included after the backslash instruct SAS to output the column percent (COL), row percent (ROW), Wald chi-square (WCHISQ), and Wald log linear chi-square (WLLCHISQ), and suppress the standard deviation (NOSTD) and weighted sums (NOWT).  The CHISQ option is used to obtain the Rao-Scott chi-square and the CHISQ1 option is used to obtain the Rao-Scott modified chi-square.  Use the FORMAT statement to read the SAS formats.

 

Calculate Chi-square Statistic to Determine whether Gender and Osteoporosis Treatment Status are Independent Using SAS Survey Procedures

Sample Code

*-------------------------------------------------------------------------;
* Use the PROC SURVEYFREQ procedure to perform a chi-square test in SAS.  ;
* This test will be used to determine whether age group and treatment for ;
* osteoporosis are independent of each other in respondents aged 20 and   ;
* over.       ;
*-------------------------------------------------------------------------;

proc surveyfreq data=DEMOOSTS;
      strata SDMVSTRA;
      cluster SDMVPSU;
      weight WTINT2YR;
      where RIDAGEYR >= 20 ;
      table AGEGRP*TREATOSTEO/col row nostd nowt wchisq wllchisq
      chisq chisq1;
      format AGEGRP AGEGRP. TREATOSTEO YESNO. ;        
run ;

 

Info iconIMPORTANT NOTE

For complex survey data such as NHANES, using the Rao-Scott F adjusted chi-square statistic is recommended since it yields a more conservative interpretation than the Wald chi-square.

 

Output of Program


                                  The SURVEYFREQ Procedure      
   
                                         Data Summary 
             
                            Number of Strata                  15
                            Number of Clusters                30
                            Number of Observations          5041
                            Sum of Weights             205284669
             
             
                                Table of AGEGRP by treatOSTEO   
            
                                Row     Column
                      AGEGRP     treatOSTEO     Frequency    Percent    Percent    Percent
                      --------------------------------------------------------------------
                       20-39            Yes             2     0.0924     0.2375     2.2097
                                         No          1738    38.8105    99.7625    40.5042
          
                                      Total          1740    38.9029    100.000           
                      --------------------------------------------------------------------
                       40-59            Yes            36     1.0062     2.6126    24.0624
                                         No          1358    37.5077    97.3874    39.1446
             
                                      Total          1394    38.5139    100.000           
                      --------------------------------------------------------------------
                       >= 60            Yes           227     3.0831    13.6521    73.7279
                                         No          1662    19.5001    86.3479    20.3512
             
                                      Total          1889    22.5832    100.000           
                      --------------------------------------------------------------------
                       Total            Yes           265     4.1817               100.000
                                         No          4758    95.8183               100.000
             
                                      Total          5023    100.000      
                      --------------------------------------------------------------------
                                             Frequency Missing = 18       
             
             
                                Rao-Scott Chi-Square Test      
             
                                Pearson Chi-Square    341.6678   
                                Design Correction       0.6712   
             
                                Rao-Scott Chi-Square  509.0778   
                                DF                           2   
                                Pr > ChiSq              <.0001   
             
                                F Value               254.5389   
                                Num DF                       2   
                                Den DF                      30   
                                Pr > F                  <.0001   
             
                                        Sample Size = 5023         

             
                                 Rao-Scott Modified Chi-Square Test 
             
                                 Pearson Chi-Square    341.6678   
                                 Design Correction       1.5353   
             
                                 Rao-Scott Chi-Square  222.5434   
                                 DF                           2   
                                 Pr > ChiSq              <.0001   
             
                                 F Value               111.2717   
                                 Num DF                       2   
                                 Den DF                      30   
                                 Pr > F                  <.0001   
             
                                        Sample Size = 5023         
             
             
                                 Wald Chi-Square Test         
             
                                 Chi-Square      91.2484       
             
                                 F Value         45.6242       
                                 Num DF                2       
                                 Den DF               15       
                                 Pr > F           <.0001       
             
                                 Adj F Value     42.5826       
                                 Num DF                2       
                                 Den DF               14       
                                 Pr > Adj F       <.0001       
             
                                        Sample Size = 5023          
             
             
                                 Wald Log-Linear Chi-Square Test   
             
                                 Chi-Square    1216.9520       
             
                                 F Value        608.4760       
                                 Num DF                2       
                                 Den DF               15       
                                 Pr > F           <.0001       
             
                                 Adj F Value    567.9109       
                                 Num DF                2       
                                 Den DF               14       
                                 Pr > Adj F       <.0001       
             
                                      Sample Size = 5023            

Highlights from the output include:

 

close window icon Close Window to return to module page.