• JoBo@feddit.uk
    link
    fedilink
    English
    arrow-up
    11
    ·
    edit-2
    1 year ago

    I’m going to have to object. We don’t use “false positive” and “false negative” as synonyms for Type I and Type II error because they’re not the same thing. The difference is at the heart of the misuse of p-values by so many researchers, and the root of the so-called replication crisis.

    Type I error is the risk of falsely concluding that the quantities being compared are meaningfully different when they are not, in fact, meaningfully different. Type II error is the risk of falsely concluding that they are essentially equivalent when they are not, in fact, essentially equivalent. Both are conditional probabilities; you can only get a Type I error when the things are, in truth, essentially equivalent and you can only get a Type II error when they are, in truth, meaningfully different. We define Type I and Type II errors as part of the design of a trial. We cannot calculate the risk of a false positive or a false negative without knowing the probability that the two things are meaningfully different.

    This may be a little easier to follow with an example:

    Let’s say we have designed an RCT to compare two treatments with Type I error of 0.05 (95% confidence) and Type II error of 0.1 (90% power). Let’s also say that this is the first large phase 3 trial of a promising drug and we know from experience with thousands of similar trials in this context that the new drug will turn out to be meaningfully different from control around 10% of the time.

    So, in 1000 trials of this sort, 100 trials will be comparing drugs which are meaningfully different and we will get a false negative for 10 of them (because we only have 90% power). 900 trials will be comparing drugs which are essentially equivalent and we will get a false positive for 45 of them (because we only have 95% confidence).

    The false positive rate is 45/135 (33.3%), nowhere near the 5% Type I error we designed the trial with.

    Statisticians are awful at naming things. But there is a reason we don’t give these error rates the nice, intuitive names you’d expect. Unfortunately we’re also awful at explaining things properly, so the misunderstanding has persisted anyway.

    This is a useful page which runs through much the same ideas as the paper linked above but in simpler terms: The p value and the base rate fallacy

    And this paper tries to rescue p-values from oblivion by calling for 0.005 to replace the usual 0.05 threshold for alpha: Redefine statistical significance.