bbaguru.in

Chi-Square Test

The chi-square test is a statistical method used to determine if there is a significant association between categorical variables. There are two primary types of chi-square tests:

  • Chi-Square Goodness of Fit Test
  • Chi-Square Test for Independence

1. Chi-Square Goodness of Fit Test

This test determines whether a sample data matches the distribution of a population.

  • Purpose: To see if the observed frequencies of a categorical variable match the expected frequencies.
  • Example: Checking if the observed frequencies of dice rolls match the expected frequencies for a fair die (each number should appear with equal frequency).

Steps for Goodness of Fit Test:

  • State Hypotheses:
    • Null Hypothesis (H0): The observed frequencies match the expected frequencies.
    • Alternative Hypothesis (H1): The observed frequencies do not match the expected frequencies.
  • Calculate the Expected Frequencies: Based on the hypothesis, calculate what the expected frequencies would be.
  • Compute the Chi-Square Statistic:

    \[\chi^2 = \sum \frac{(O - E)^2}{E}\]

  • O: Observed frequency
  • E: Expected frequency
  • Determine the Degrees of Freedom (df):
    • df = number of categories − 1
  • Compare to the Critical Value: Find the critical value from the chi-square distribution table using the degrees of freedom and the desired level of significance (usually 0.05). If χ2 is greater than the critical value, reject H0.
  • Alternatively, Use the P-Value: Calculate the p-value associated with the χ2 value. If the p-value is less than the significance level (e.g., 0.05), reject H0.

2. Chi-Square Test for Independence

This test determines if there is an association between two categorical variables.

  • Purpose: To see if distributions of categorical variables differ from each other.
  • Example: Examining whether gender is related to voting preference.

Steps for Test for Independence:

  • State Hypotheses:
    • Null Hypothesis (H0): There is no association between the two variables (they are independent).
    • Alternative Hypothesis (H1): There is an association between the two variables (they are not independent).
  • Construct a Contingency Table: Organize the observed frequencies of the variables into a table.
  • Calculate the Expected Frequencies: E = grand total(row total × column total)​
  • Compute the Chi-Square Statistic: 

    \[\chi^2 = \sum \frac{(O - E)^2}{E}\]

  • O: Observed frequency
  • E: Expected frequency
  • Determine the Degrees of Freedom (df):
    • df = (number of rows−1)×(number of columns−1)
  • Compare to the Critical Value: Find the critical value from the chi-square distribution table using the degrees of freedom and the desired level of significance. If χ2 is greater than the critical value, reject H0.
  • Alternatively, Use the P-Value: Calculate the p-value associated with the χ2 value. If the p-value is less than the significance level, reject H0.

Interpreting Chi-Square Test Results

  • Small χ² Value: Indicates that the observed data fits the expected data well, suggesting a relationship between the variables (for the test of independence) or that the sample data matches the population (for the goodness of fit test).
  • Large χ² Value: Indicates that the observed data does not fit the expected data well, suggesting no relationship between the variables (for the test of independence) or that the sample data does not match the population (for the goodness of fit test).

Examples

Example 1: Chi-Square Goodness of Fit Test

Suppose you roll a die 60 times, and you want to test if it is fair. The observed frequencies are: 10 ones, 8 twos, 12 threes, 14 fours, 9 fives, and 7 sixes.

  • Hypotheses:
    • H0: The die is fair (all faces have equal probability).
    • H1: The die is not fair.
  • Expected Frequencies: For a fair die, each face should appear 

        \[\frac{60}{6} = 10\]

  • Observed frequencies:
    • Ones: 10
    • Twos: 8
    • Threes: 12
    • Fours: 14
    • Fives: 9
    • Sixes: 7

To Calculate the Chi-Square (χ²) Goodness of Fit, we follow this formula:

    \[\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}\]

Where:

  • Oi = Observed frequency
  • Ei​ = Expected frequency

Step 1: Observed and Expected Frequencies

Face Observed (O) Expected (E = 60/6 = 10) (O − E)² / E
One1010(0)² / 10 = 0.00
Two810(−2)² / 10 = 0.40
Three1210(2)² / 10 = 0.40
Four1410(4)² / 10 = 1.60
Five910(−1)² / 10 = 0.10
Six710(−3)² / 10 = 0.90

Step 2: Summing the Values

χ2 = 0 + 0.40 + 0.40 + 1.60 + 0.10 + 0.90 = 3.40


Step 3: Degrees of Freedom and Conclusion

  • Degrees of Freedom (df) = Number of categories − 1 = 6 − 1 = 5
  • Using a Chi-square distribution table:
    • At α = 0.05, critical value for df = 5 is 11.07

Decision:

Since χ2 = 3.40 < 11.07, we fail to reject H0.


Conclusion:

There is no significant evidence at the 5% level to conclude that the die is unfair.

Example 2: Chi-Square Test for Independence

Suppose you survey 100 people on their preferred type of movie (Action, Comedy, Drama) and their gender (Male, Female). The observed frequencies are:

Genre Action Comedy Drama Total
Male20101040
Female10203060
Total303040100
  • Hypotheses:
    • H0: There is no association between gender and movie preference.
    • H1: There is an association between gender and movie preference.
  • Expected Frequencies:

For the expected frequencies, we use the formula:

    \[E = \frac{(\text{Row Total}) \times (\text{Column Total})}{\text{Grand Total}}\]

Calculations:

    \[\begin{aligned}\bullet\quad &E_{\text{Male, Action}} = \frac{40 \times 30}{100} = 12 \\\bullet\quad &E_{\text{Male, Comedy}} = \frac{40 \times 30}{100} = 12 \\\bullet\quad &E_{\text{Male, Drama}} = \frac{40 \times 40}{100} = 16 \\\bullet\quad &E_{\text{Female, Action}} = \frac{60 \times 30}{100} = 18 \\\bullet\quad &E_{\text{Female, Comedy}} = \frac{60 \times 30}{100} = 18 \\\bullet\quad &E_{\text{Female, Drama}} = \frac{60 \times 40}{100} = 24 \\\]

Calculate χ2
Use the formula:

    \[\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}\]

Summing these values gives the chi-square statistic: 

    \[\chi^2 = \frac{(20 - 12)^2}{12} + \frac{(10 - 12)^2}{12} + \frac{(10 - 16)^2}{16} + \frac{(10 - 18)^2}{18} + \frac{(20 - 18)^2}{18} + \frac{(30 - 24)^2}{24}\]

    \[\chi^2 = \frac{64}{12} + \frac{4}{12} + \frac{36}{16} + \frac{64}{18} + \frac{4}{18} + \frac{36}{24}\]

    \[\chi^2 = 5.33 + 0.33 + 2.25 + 3.56 + 0.22 + 1.50\]

    \[\chi^2 = 13.19\]

  • Degrees of Freedom: 

df = (2−1) × (3−1) = 2

  • Compare to Critical Value: From the chi-square table, the critical value for df = 2 and α = 0.05 is 5.99. Since 18.33 > 5.99, we reject H0.

In both examples, chi-square tests help determine if there is a significant difference between observed and expected frequencies, indicating relationships or fitting patterns within the data.

Scroll to Top