Definition:Chi-Squared Test/Goodness of Fit
Definition
The chi-squared test for goodness of fit is a test of goodness of fit of observations to some theoretical probability distribution.
Let $n \in \Z_{>0}$.
Let a value $x_i$ for $i \in \set {1, 2, \ldots, n}$ be expected to occur $E_i$ times.
Let $x_i$ actually occur $O_i$ times.
Then the statistic:
- $\ds \chi^2 = \sum_i \dfrac {\paren {O_i - E_i}^2} {E_i}$
has a $\chi$-squared distribution with $n - p$ degrees of freedom where $p$ is the number of distribution parameters estimated from the data and used to compute the $E_i$.
Significantly high values of $\chi^2$ lead to the rejection of the hypothesised distribution.
Modifications are needed if some $E_i$ are small, that is, if several values of $E_i$ are $5$ or less.
Continuous Distribution
The $\chi$-squared test for goodness of fit can be adapted for grouped data from a continuous probability distribution for when these are the only data there are.
However, if individual observations are available, they should not be arbitrarily grouped together simply so that the test can be applied, because the outcome of the test is not independent of the choice of class intervals.
Some groupings may lead to a significant value for the $\chi$-squared statistic, while others may not.
Examples
Cast of Dice
Let $D$ be a die which we want to determine is fair or not.
Let $D$ be cast $96$ times.
Then:
- $x_i \in \set {1, 2, 3, 4, 5, 6}$
If $D$ is fair, then for all $i$, the number of times we expect to observe each face of $D$ is:
- $E_i = 96 \times \dfrac 1 6 = 16$
Suppose in our trial, the number of times each face comes up is shown in the table below:
\(\ds O_1\) | \(=\) | \(\ds 14\) | ||||||||||||
\(\ds O_2\) | \(=\) | \(\ds 19\) | ||||||||||||
\(\ds O_3\) | \(=\) | \(\ds 11\) | ||||||||||||
\(\ds O_4\) | \(=\) | \(\ds 21\) | ||||||||||||
\(\ds O_5\) | \(=\) | \(\ds 12\) | ||||||||||||
\(\ds O_6\) | \(=\) | \(\ds 19\) |
Then:
\(\ds \chi^2\) | \(=\) | \(\ds \sum_{i \mathop = 1}^6 \dfrac {\paren {O_i - E_i}^2} {E_i}\) | ||||||||||||
\(\ds \) | \(=\) | \(\ds \dfrac {\paren {14 - 16}^2} {16} + \dfrac {\paren {19 - 16}^2} {16} + \dfrac {\paren {11 - 16}^2} {16} + \dfrac {\paren {21 - 16}^2} {16} + \dfrac {\paren {12 - 16}^2} {16} + \dfrac {\paren {19 - 16}^2} {16}\) | ||||||||||||
\(\ds \) | \(=\) | \(\ds \dfrac 4 {16} + \dfrac 9 {16} + \dfrac {25} {16} + \dfrac {25} {16} + \dfrac {16} {16} + \dfrac 9 {16}\) | ||||||||||||
\(\ds \) | \(=\) | \(\ds \dfrac {88} {16}\) | ||||||||||||
\(\ds \) | \(=\) | \(\ds 5.5\) |
The expectation of $16$ is computed from the data, so there are $6 - 1 = 5$ degrees of freedom.
This article, or a section of it, needs explaining. You can help $\mathsf{Pr} \infty \mathsf{fWiki}$ by explaining it. To discuss this page in more detail, feel free to use the talk page. When this work has been completed, you may remove this instance of {{Explain}} from the code. |
The $\chi^2$ value is not significant at the $5 \%$ level (i.e. is $< 11.07$), so the hypothesis that $D$ is fair is not rejected.
Also see
- Results about the $\chi$-squared test can be found here.
Sources
- 1998: David Nelson: The Penguin Dictionary of Mathematics (2nd ed.) ... (previous) ... (next): chi-squared test: 1.
- 2008: David Nelson: The Penguin Dictionary of Mathematics (4th ed.) ... (previous) ... (next): chi-squared test: 1.