header.brand
nav.homenav.coursesnav.labsnav.case_files
Laboratory Index
common.source common.pdf
Statistical Inference

Statistical Inference

Part I: Theory

The Logic of Discovery

The P-Value Definition

We start with a boring assumption called the Null Hypothesis (H0H_0H0​). It states that nothing interesting is happening—the drug didn't work, the groups are identical, or the correlation is zero.

H0:θ=θ0vsH1:θ≠θ0H_0: \theta = \theta_0 \quad \text{vs} \quad H_1: \theta \neq \theta_0H0​:θ=θ0​vsH1​:θ=θ0​

We then construct a mathematical universe where H0H_0H0​ is true. If our observed data (ttt) falls in the extreme tails of this universe, it is surprising. The P-Value is the probability of seeing data this extreme by pure luck.

P-value=P(T≥t∣H0 is True)P\text{-value} = P(T \ge t | H_0 \text{ is True})P-value=P(T≥t∣H0​ is True)
inf.alpha.p_value0.0482
f(x)=12πe−12x2f(x) = \frac{1}{\sqrt{2\pi}} e^{-\frac{1}{2}x^2}f(x)=2π​1​e−21​x2
α=.05α=.05
inf.dist.stat1.96

The Cost of Skepticism

inf.alpha.title: α\alphaα

inf.alpha.desc

  • inf.alpha.stdinf.alpha.std_desc
  • inf.alpha.strictinf.alpha.strict_desc
  • inf.alpha.physicsinf.alpha.physics_desc
inf.alpha.level (α\alphaα)0.050
inf.alpha.t15.0%
inf.alpha.t285.0%

How surprising is "surprising enough"? We set a threshold called Alpha (α\alphaα). This is a policy decision, not a mathematical truth.

  • Set α\alphaα low (0.01): You rarely cry wolf (Low Type I Error), but you miss real discoveries (High Type II Error).
  • Set α\alphaα high (0.10): You find everything, including noise.

The Trade-off

"There is no free lunch. Minimizing false alarms guarantees missing signals."

The Alternative Reality

We don't just reject H0H_0H0​; we accept an alternative H1H_1H1​. This leads to the Neyman-Pearson Framework, where we care about Statistical Power.

Power=1−β=P(Reject H0∣H1 is True)Power = 1 - \beta = P(\text{Reject } H_0 | H_1 \text{ is True})Power=1−β=P(Reject H0​∣H1​ is True)

The Blue Area (β\betaβ) represents the risk of missing a real effect. Notice how increasing sample size (nnn) or effect size pulls the distributions apart, increasing Power.

inf.power.title

inf.power.desc

inf.alpha.t1 (α)
0.049
inf.alpha.t2 (β)
0.089
inf.power.power
0.911
H₀H₁c
inf.power.crit1.65
inf.power.effect (θ\thetaθ)3.0

Power Analysis Insights

Part II: Meta-Science

The Law of Large Numbers

A Single Study Means Nothing

inf.dist.title

inf.dist.desc

inf.dist.effect (δ\deltaδ)0

inf.dist.null_true

inf.dist.waiting

Analysis

We tend to treat a single p<0.05p < 0.05p<0.05 result as truth. It isn't.

If the Null Hypothesis is actually true (the drug does nothing), P-values will follow a Uniform Distribution.

P∼Uniform(0,1)∣H0P \sim \text{Uniform}(0, 1) \quad | \quad H_0P∼Uniform(0,1)∣H0​

Any single "significant" result (the red bar) could just be a random draw from this flat distribution. Only repeated replication reveals the true shape.

Part III: The Crisis

How to Break Science

The Look-Elsewhere Effect

If α=0.05\alpha = 0.05α=0.05 , you accept a 5% risk of a False Positive. This implies a dangerous mathematical certainty:

P(At least 1 False Pos)=1−(1−α)kP(\text{At least 1 False Pos}) = 1 - (1 - \alpha)^kP(At least 1 False Pos)=1−(1−α)k

If you run 20 useless experiments (k=20k=20k=20) on random noise, you are statistically guaranteed (~64% chance) to find at least one "significant" result.

The P-Hacking Game

Try clicking the button until you get a green square. Congratulations, you just published a false paper.
hack.status.experiments: 0
hack.lab.testing

"hack.hypothesis.prefix Cyan hack.hypothesis.jelly hack.hypothesis.cause_q Hair Loss?"

p = ?.???
exp.inference.s5.ref
hack.list.published0
hack.list.empty
hack.list.discarded
hack.list.discarded_desc
0

P-Hacking Detector

Manufacturing Significance

Sometimes we don't run new experiments; we just "clean" the old ones. By selectively removing data points (labeling them "Outliers"), we can force a significant difference where none exists.

This is often called Data Torture: "If you torture the data long enough, it will confess."

inf.torture.title

inf.torture.quote

inf.torture.group_a
inf.torture.group_b
inf.torture.diff (Δxˉ\Delta\bar{x}Δxˉ)2.69
inf.torture.hypothesis: μA≠μB\mu_A \neq \mu_BμA​=μB​P = 0.3323
inf.torture.click
inf.torture.insig
Part IV: Redemption

Estimation over Decision

The Dance of Confidence

The binary decision ("Significant / Not Significant") destroys information. Instead, we should focus on Estimation using Confidence Intervals (CI).

P(L≤θ≤U)=1−αP(L \le \theta \le U) = 1 - \alphaP(L≤θ≤U)=1−α

A 95% CI doesn't mean "95% chance the truth is here." It means: "If we repeated this experiment 100 times, 95 of the generated intervals would capture the true parameter."

inf.ci.title

inf.ci.desc

inf.dist.sample (n)30
μ\muμ inf.ci.true_mean
inf.ci.waiting
inf.ci.rate
0%
inf.ci.total: 0

Coverage Analysis

footer.brand.

footer.brand_description

footer.index

  • nav.home
  • nav.labs
  • nav.case_files
  • nav.courses
  • nav.about

© 2026 footer.rights_reserved Created by Ezz Eldin Ahmed.