Statistical Inference
The Logic of Discovery
The P-Value Definition
We start with a boring assumption called the Null Hypothesis (). It states that nothing interesting is happening—the drug didn't work, the groups are identical, or the correlation is zero.
We then construct a mathematical universe where is true. If our observed data () falls in the extreme tails of this universe, it is surprising. The P-Value is the probability of seeing data this extreme by pure luck.
The Cost of Skepticism
inf.alpha.title:
inf.alpha.desc
- inf.alpha.stdinf.alpha.std_desc
- inf.alpha.strictinf.alpha.strict_desc
- inf.alpha.physicsinf.alpha.physics_desc
How surprising is "surprising enough"? We set a threshold called Alpha (). This is a policy decision, not a mathematical truth.
- Set low (0.01): You rarely cry wolf (Low Type I Error), but you miss real discoveries (High Type II Error).
- Set high (0.10): You find everything, including noise.
The Trade-off
The Alternative Reality
We don't just reject ; we accept an alternative . This leads to the Neyman-Pearson Framework, where we care about Statistical Power.
The Blue Area () represents the risk of missing a real effect. Notice how increasing sample size () or effect size pulls the distributions apart, increasing Power.
inf.power.title
inf.power.desc
Power Analysis Insights
The Law of Large Numbers
A Single Study Means Nothing
inf.dist.title
inf.dist.desc
inf.dist.null_true
Analysis
We tend to treat a single result as truth. It isn't.
If the Null Hypothesis is actually true (the drug does nothing), P-values will follow a Uniform Distribution.
Any single "significant" result (the red bar) could just be a random draw from this flat distribution. Only repeated replication reveals the true shape.
How to Break Science
The Look-Elsewhere Effect
If , you accept a 5% risk of a False Positive. This implies a dangerous mathematical certainty:
If you run 20 useless experiments () on random noise, you are statistically guaranteed (~64% chance) to find at least one "significant" result.
The P-Hacking Game
"hack.hypothesis.prefix Cyan hack.hypothesis.jelly hack.hypothesis.cause_q Hair Loss?"
P-Hacking Detector
Manufacturing Significance
Sometimes we don't run new experiments; we just "clean" the old ones. By selectively removing data points (labeling them "Outliers"), we can force a significant difference where none exists.
This is often called Data Torture: "If you torture the data long enough, it will confess."
inf.torture.title
inf.torture.quote
Estimation over Decision
The Dance of Confidence
The binary decision ("Significant / Not Significant") destroys information. Instead, we should focus on Estimation using Confidence Intervals (CI).
A 95% CI doesn't mean "95% chance the truth is here." It means: "If we repeated this experiment 100 times, 95 of the generated intervals would capture the true parameter."
inf.ci.title
inf.ci.desc