Errors in hypothesis testing
Last updated on 2024-03-12 | Edit this page
Estimated time: 3 minutes
Overview
Questions
- What errors can occur in hypothesis testing?
Objectives
- List the possible errors in hypothesis testing and give the reasons why they occur.
We just learned about data snooping, or HARKing, which is one way of p-hacking, i.e. cheating to get significant results. Another example is method hacking, where you try different methods until one of them gives a significant result.
You might have heard people say that p-hacking (or other things) increase type I error. Here’s what this means:
If you perform a test and either reject the null hypothesis or not, there are four possible outcomes:
- The favorable classifications are true negative,
where you correctly don’t reject the null, and true
positive, where you correctly reject it and decide that there
is something interesting to see in your data.
- There are also two ways in which the classification can go wrong.
One is said false positive, also called type I
error, where the null is incorrectly rejected. This just
happens by chance in a certain percentage of cases. For example, per
definition it happens in 5% of the cases when you choose a significance
level of \(\alpha=0.05\). But there are
other things that can increase the chances for false positives,
including p-hacking, multiple comparisons (which is part of the next
tutorial), or certain violations of the assumptions
(e.g. independence).
- False negative classification or type II error describes the case when the null hypothesis isn’t rejected even though something interesting is happening. This is likely to happen if we have too little data, or use methods that are not powerful enough to detect what we’re looking for. We try to avoid type II error by choosing methods with a high power.