The chi-square test is used to test the independence of two variables cross classified in a two-way table. For example, suppose we wished to test the hypothesis that calcium supplement use is independent of osteoporosis treatment status and that we have the following observed frequencies obtained as a result of the cross-classification of osteoporosis and supplement use for women.
|
Osteoporosis Treatment Status - Yes |
Osteoporosis Treatment Status - No |
Total |
---|---|---|---|
Supplement Use - Yes |
155 |
566 |
721 |
Supplement Use - No |
47 |
419 |
466 |
Total |
202 |
985 |
1187 |
In a simple random sample setting (unweighted data), the expected cell frequencies under the null hypothesis that osteoporosis treatment status and calcium supplement use are independent could be obtained by multiplying the marginal total for the ith row by the proportion of individuals in the jth column.
For example, the expected value of supplement users who received treatment for osteoporosis would be 721*(202/1187)=123; the expected value of supplement users who did not receive treatment for osteoporosis 721*(985/1187)=598.
Thus, if Oij = the observed frequency of the ith row and jth column, where i=1,2, … i and j=1,2, … j and Eij = the expected frequency of the ith row and jth column. Then the formula to test the null hypothesis of independence, using the chi-square statistic, would be
This statistic has degrees of freedom equal to the number of rows minus 1, multiplied by the number of columns minus 1.
In a complex sample setting, you would use a statistic similar to equation (1) above, modified to account for survey design with degrees of freedom equal to the number of PSUs minus the number of strata containing observations. This statistic can be obtained through SAS proc surveyfreq (chisq, based on the Rao-Scott chi-square with an adjusted F statistic). The analogous procedure in SUDAAN version 10.0 (proc crosstab), provides limited chi-square statistics based on Wald chi-square and does not provide an F adjusted p-value. However, SUDAAN regression models do provide F adjusted chi-square statistics which are recommended for analyzing NHANES data.
The Cochran Mantel Haenzel Test, an extension of the Pearson Chi-Square, can be applied to stratified two-way tables to test for homogeneity or independence in a non-survey setting. For a complex sample its analogue can be obtained in SUDAAN proc crosstab (cmh).
Agresti A. An Introduction to Categorical Data Analysis. Wiley Series in Probability and Statistics. 1996. New York.
Close Window to return to module page.