I am performing a number of statistical tests on my dataset. I would like to control the type 1 error, the decision to reject the null hypothesis when it is, in fact, true. I understand that when I perform many hypothesis tests on the same set of data, the probability of making a type 1 error can increase from the conventional .05. I have heard about something called the Bonferroni adjustment that can fix this problem. How does it work?
The Bonferroni adjustment works by making it more difficult for any one test to be statistically significant. It works by dividing your alpha level (usually set to .05 by convention) by the number of tests you're performing. For instance, suppose you performed five tests on the same database. The Bonferroni adjusted level of significance any one test would need to obtain statistical significance would be:
.05 / 5 = .01
Any test that results in a probability value of less than .01 would be statistically significant. Any test statistic with a probability value greater than .01 (including values that fall between .01 and .05) would be deemed non-significant.
Some authors (e.g., Jaccard & Wan, 1996) have pointed out that this method of controlling type 1 error becomes very conservative, perhaps too conservative, when the number of comparisons grows large. Jaccard and Wan (1996, p.30) suggest the use of a modified Bonferroni procedure that still retains an overall type 1 error rate of 5% (alpha = .05). The modified Bonferroni procedure works as follows: Rank order the significance values obtained from your multiple tests from smallest to largest. Tied significance values may be ordered by theoretical criteria or arbitrarily. Evaluate the significance of the test with the smallest p-value at alpha / number of tests, just as you would in the Bonferroni procedure discussed above. If the test statistic result is statistically significant after this adjustment has been performed, move on to the test results from the test with the next smallest significance value. Evaluate this test statistic at alpha / (number of tests - 1). If this test statistic is significant after the adjustment, proceed to the third smallest significance value and evaluate it at alpha / (number of tests - 2). Proceed in this fashion until a non-significant test statistic result is obtained.
An example may help clarify the procedure. The table below shows for five hypothetical tests the test number, obtained significance, the original alpha, the divisor which you would divide into the original alpha to obtain the new alpha, and the evaluation of the test's statistical significance.
| |
|
|
|
|
|
| |
|
|
|
|
|
| |
|
|
|
|
|
| |
|
|
|
|
|
| |
|
|
|
|
|
| |
|
|
|
|
|
Notice that test 1 would be significant under either Bonferroni adjustment method, but test 2 is significant only under the modified Bonferroni method. Test 3 is not significant under either method. Even though Test 4's obtained significance value is less than the modified Bonferroni alpha, test 4 is also not significant because of its requirement that all tests after the first non-significant test are also non-significant.
References
Jaccard, J. & Wan, C. K. (1996). LISREL approaches to interaction effects in multiple regression. Thousand Oaks, CA: Sage Publications.
Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandanavian Journal of Statistics, 6: 65-70.
Holland, B. S., and Copenhaver, M. (1988). Improved Bonferroni-type multiple testing procedures. Psychological Bulletin 104: 145-149.
Seaman, M. A., Levin, K. R., and Serlin, R. C. (1991). New developments in pairwise multiple comparisons: Some powerful and practicable procedures. Psychological Bulletin 110: 577-586.
If you have further questions, send E-mail to stats@ssc.utexas.edu.