This article aims to provide some of the derivations from Ioannidis' 2005 article: "Why Most Published Research Findings Are False" which exposed what has since been termed "The Replication Crisis."
The issue begins with the subject of p-values which measure the probability of a study finding a positive result, assuming the presented hypothesis is false. A strong p-values is considered to be 0.05, indicating a regrettable 5% of published findings are false.
Before diving into the derivations, some examples:
Example 1
Suppose we represent all possible hypotheses that can be tested with a more manageable 100,000 hypotheses. Let's allow a generous 50:50 true:false split for this set as well as a statistical power of 80%.
True
False
Total
positive result +
40k
2.5k
42.5k
negative result −
10k
47.5k
57.5k
Total
50k
50k
100k
Here, the p-value =α=P(+∣ f ) where + is a positive relationship, and f is a flase result. The statistical power =P(+∣ t ) and Positive Predictive Value PPV=P( t + f ∣+)statistical power=42.5k40k≈0.94 which is pretty satisfactory given our generous values.
Example 2
Once again, we'll take 100,000 hypotheses, but now with a 10:90 true:false split for this set as well as a statistical power of 80%. Filling out the table we get:
True
False
Total
+
8k
4.5k
12.5k
−
2k
85.5k
87.5k
Total
10k
90k
100k
Here, PPV=12.5k8k=0.64 which is significantly worse than the assumed 95% if the study is positive without publicaiton bias, cheating etc. which is covered below.
Before getting much further, it will be useful to define a glossary
Symbol
Value
Meaning
p
P(+∣ f )
probability of a study finding a positive result, given that the hypothesis is false
PPV
P( t + f ∣+)statistical power
Positive Predictive Value
R
P( f )P( t )
the pre-study odds of the hypothesis is tested
Θ
=R=1−PP
an alternate expression of probability, e.g. 10:90 odds: Θ=100%−10%10%
P(f)
1−P( t )
compliment rule
α
P(+∣ f )
Type I Error
β
P(−∣ t )
Type II Error
P(t ∣+)
P(t)P(t)⋅P(+∣ t )
Bayes Rule
P(t∧+)
P( t ∣+)⋅P(t)
Product Rule
u
bias factor influenced by p-hacking, conflict of interest, competitive publication motivations, etc.
Table 1
Now we can recreate the general table for all such examples above and derive their values:
True
False
Total
+
R+1c(1−β)R
R+1cα
R+1c(R+α−βR)
−
R+1cβR
R+1c(1−α)
R+1c(1−α+βR)
Total
R+1cR
R+1c
c the number of relationships tested
Derivations
Starting with the top left cell which represents:
P(+∧ t )=§P(+∣ t )⋅R+1RP(t)
§1.1:β=P(−∣ t )
§1.2:P(+∣ t )+P(−∣ t )=1
§1.3:P(+∣ t )+P(−∣ t )=1
§∴P(+∣ t )=1−β
=(1−β)(R+1R)=R+1c(1−β)R
Similarly, for the top-middle cell:
P(+∧ f )=αP(+∣ f )⋅1−R+1RP(f)
=α(1−R+1R)=R+1cα
So, for all true positives, the top-right cell:
all positivestrue positives=R+1c(R+α−βR)R+1c(1−β)R=R+α−βR(1−β)R=want this bad boi to be highP( t ∣+) in terms of Type I, II error and pre-study odds.
When is a Study More Likely to be True than False?
Some fields of study have inherently small R or (1−β) values
What Happens if we Introduce Bias?
P(+∧ f )=P(+∣ f )⋅P(f)bias⟶ negative study results become positive with P(u)
This can alter our outcome by in two cases:
1. P(for any reason+∧t)
=1−βP(+∣ t )⋅R+1RP(t)+u⋅Type II Error: βP(−∣ t )⋅P(t)
=(1−β)(R+1R)+uβR+1R
=R+11−βR+uβR
2. P(for any reason+∧f)
=P(+∣ f )⋅P(f)_1−P(t)+u⋅P(−∣ f )⋅P(f)
=α(1−R+1R)+u(1−α)(1−R+1R) Note that these truths/falsehoods have to be independent of the decision making otherwise they would impair judgement, disallowing us from applying the Product Rule
=R+1α+u(1−α)
The Issue of Incorrect pre-publication p-values.
Research efforts do not occur in isolation. Several teams may be independently, competitive working on the same hypotheses over and over and over again without adjusting their p-values.