The Null Hypothesis
01.Atlas02.Laboratory03.Workshop04.Case Studies05.Library
TABLE OF CONTENTS
Back to Archive
Atlas
VOL. IMAR 2026

The Normal Distribution

The Gaussian distribution, or normal distribution, is a continuous probability distribution on R\mathbb{R}R characterised by two parameters: a location parameter μ∈R\mu \in \mathbb{R}μ∈R and a scale parameter σ2>0\sigma^2 > 0σ2>0. Its probability density function is:

f(x; μ,σ2)=1σ2πexp⁡ ⁣(−(x−μ)22σ2),x∈Rf(x;\, \mu, \sigma^2) = \frac{1}{\sigma\sqrt{2\pi}} \exp\!\left(-\frac{(x - \mu)^2}{2\sigma^2}\right), \quad x \in \mathbb{R}f(x;μ,σ2)=σ2π​1​exp(−2σ2(x−μ)2​),x∈R

The distribution arises as the limiting form of the normalised sum of independent, identically distributed random variables with finite variance — a consequence of the Central Limit Theorem. This emergence from error accumulation processes explains its prevalence in measurement theory, physical modelling, and statistical inference.

Analytically, the Gaussian family is closed under both affine transformation and convolution: any affine function of a normal variable is normal, and the sum of independent normals is normal. This stability under convolution makes it the unique attractor in the sense formalised by the CLT.

Berry-Esseen Bound
The CLT convergence has a quantitative rate. If X1,…,XnX_1, \ldots, X_nX1​,…,Xn​ are i.i.d. with mean μ\muμ, variance σ2\sigma^2σ2, and third absolute moment ρ=E[∣X−μ∣3]<∞\rho = E[|X - \mu|^3] < \inftyρ=E[∣X−μ∣3]<∞, then:sup⁡x∈R∣Fn(x)−Φ(x)∣≤Cρσ3n\sup_{x \in \mathbb{R}} \left|F_n(x) - \Phi(x)\right| \leq \frac{C\rho}{\sigma^3\sqrt{n}}x∈Rsup​∣Fn​(x)−Φ(x)∣≤σ3n​Cρ​where C<0.4748C < 0.4748C<0.4748 (Shevtsova, 2011) and Φ\PhiΦ is the standard normal CDF.
Maximum Entropy
Among all distributions on R\mathbb{R}R with fixed mean μ\muμ and variance σ2\sigma^2σ2, the Gaussian uniquely maximises the differential entropy:h(X)=−∫−∞∞f(x)ln⁡f(x) dxh(X) = -\int_{-\infty}^{\infty} f(x) \ln f(x)\,dxh(X)=−∫−∞∞​f(x)lnf(x)dxThis maximum-entropy characterisation provides a rigorous justification for the natural occurrence of Gaussian distributions: given only a mean and variance, the Gaussian is the least informative consistent choice.
0
Errors Accumulated
Sum of Independent Random Errors → Normal Distribution

Probability Density and Measure

For any continuous random variable, point probabilities are identically zero: P(X=x)=0P(X = x) = 0P(X=x)=0 for every x∈Rx \in \mathbb{R}x∈R. This is a consequence of the distribution being absolutely continuous with respect to Lebesgue measure λ\lambdaλ on R\mathbb{R}R.

The probability density function fff is the Radon-Nikodym derivative of the probability measure PPP with respect to λ\lambdaλ. Probability is recovered by integration over any measurable set A∈B(R)A \in \mathcal{B}(\mathbb{R})A∈B(R):

P(X∈A)=∫Af(x) dλ(x)P(X \in A) = \int_A f(x)\,d\lambda(x)P(X∈A)=∫A​f(x)dλ(x)

The value f(x)f(x)f(x) is a density, not a probability. It is not bounded above by 1 — its only constraint is ∫−∞∞f(x) dx=1\int_{-\infty}^{\infty} f(x)\,dx = 1∫−∞∞​f(x)dx=1. Probability accumulates only over intervals, not at points.

Corollary — Empirical Rule

For X∼N(μ,σ2)X \sim \mathcal{N}(\mu, \sigma^2)X∼N(μ,σ2), the following probabilities follow directly from Φ\PhiΦ:

P(μ−σ≤X≤μ+σ)P(\mu - \sigma \leq X \leq \mu + \sigma)P(μ−σ≤X≤μ+σ)=2Φ(1)−1≈68.27%= 2\Phi(1) - 1 \approx 68.27\%=2Φ(1)−1≈68.27%
P(μ−2σ≤X≤μ+2σ)P(\mu - 2\sigma \leq X \leq \mu + 2\sigma)P(μ−2σ≤X≤μ+2σ)=2Φ(2)−1≈95.45%= 2\Phi(2) - 1 \approx 95.45\%=2Φ(2)−1≈95.45%
P(μ−3σ≤X≤μ+3σ)P(\mu - 3\sigma \leq X \leq \mu + 3\sigma)P(μ−3σ≤X≤μ+3σ)=2Φ(3)−1≈99.73%= 2\Phi(3) - 1 \approx 99.73\%=2Φ(3)−1≈99.73%

These three intervals account for the near-totality of mass — a direct consequence of the rapid exponential decay of the Gaussian tails.

Density vs. Probability
Formally, the probability measure on (R,B(R))(\mathbb{R}, \mathcal{B}(\mathbb{R}))(R,B(R)) satisfies:P([a,b])=∫abf(x) dλ(x)P([a, b]) = \int_a^b f(x)\,d\lambda(x)P([a,b])=∫ab​f(x)dλ(x)For the standard normal, f(0)=(2π)−1/2≈0.399f(0) = (2\pi)^{-1/2} \approx 0.399f(0)=(2π)−1/2≈0.399. This exceeds zero but represents density at a single point of measure zero — not a probability.
Click and dragTap and drag to select an interval
Select a range to calculate probability

The Parametric Family

The normal family is parametrised by two quantities: the mean μ∈R\mu \in \mathbb{R}μ∈R, which determines the location of the distribution, and the variance σ2>0\sigma^2 > 0σ2>0, which controls its dispersion. We write X∼N(μ,σ2)X \sim \mathcal{N}(\mu, \sigma^2)X∼N(μ,σ2) to denote that the random variable XXX has density:

f(x;μ,σ2)=1σ2πexp⁡ ⁣(−12(x−μσ)2),x∈Rf(x; \mu, \sigma^2) = \frac{1}{\sigma\sqrt{2\pi}} \exp\!\left(-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2\right), \quad x \in \mathbb{R}f(x;μ,σ2)=σ2π​1​exp(−21​(σx−μ​)2),x∈R

Every member of the family is an affine image of the standard normal N(0,1)\mathcal{N}(0, 1)N(0,1): if Z∼N(0,1)Z \sim \mathcal{N}(0, 1)Z∼N(0,1), then X=μ+σZ∼N(μ,σ2)X = \mu + \sigma Z \sim \mathcal{N}(\mu, \sigma^2)X=μ+σZ∼N(μ,σ2). Equivalently, the entire two-parameter family is generated by translations and scalings of a single canonical form. This structure defines a location-scale family.

The density achieves its unique maximum at x=μx = \mux=μ, where f(μ)=(σ2π)−1f(\mu) = (\sigma\sqrt{2\pi})^{-1}f(μ)=(σ2π​)−1. Inflection points occur at x=μ±σx = \mu \pm \sigmax=μ±σ. All odd central moments vanish, and the even central moments satisfy E[(X−μ)2k]=(2k−1)!!⋅σ2kE[(X - \mu)^{2k}] = (2k-1)!! \cdot \sigma^{2k}E[(X−μ)2k]=(2k−1)!!⋅σ2k for non-negative integers kkk, where (2k−1)!!=1⋅3⋯(2k−1)(2k-1)!! = 1 \cdot 3 \cdots (2k-1)(2k−1)!!=1⋅3⋯(2k−1) is the double factorial.

The Living Shape

Drag sliders to reshape the family.

N(μ, σ²)
N(0.0, 1.0²)
Mean (μ)
0.0
Center of gravity
Spread (σ)
1.0
Width of errors

Change of Scale: Standardisation

Let X∼N(μ,σ2)X \sim \mathcal{N}(\mu, \sigma^2)X∼N(μ,σ2). The standardised variable is defined as:

Z=X−μσ∼N(0,1)Z = \frac{X - \mu}{\sigma} \sim \mathcal{N}(0, 1)Z=σX−μ​∼N(0,1)

This is an affine transformation, not a change of distributional shape. Subtracting μ\muμ translates the distribution to the origin; dividing by σ\sigmaσ rescales the horizontal axis so that one unit corresponds to one standard deviation. The resulting variable ZZZ always follows the standard normal N(0,1)\mathcal{N}(0, 1)N(0,1), regardless of the original parameters.

The practical consequence is that probability calculations for any member of the normal family reduce to a single reference: the standard normal CDF Φ(z)\Phi(z)Φ(z). For any a,b∈Ra, b \in \mathbb{R}a,b∈R:

P(a≤X≤b)=Φ ⁣(b−μσ)−Φ ⁣(a−μσ)P(a \leq X \leq b) = \Phi\!\left(\frac{b - \mu}{\sigma}\right) - \Phi\!\left(\frac{a - \mu}{\sigma}\right)P(a≤X≤b)=Φ(σb−μ​)−Φ(σa−μ​)
Affine Equivariance
More generally, if X∼N(μ,σ2)X \sim \mathcal{N}(\mu, \sigma^2)X∼N(μ,σ2) and Y=aX+bY = aX + bY=aX+b for constants a≠0a \neq 0a=0 and b∈Rb \in \mathbb{R}b∈R, then:Y∼N(aμ+b,  a2σ2)Y \sim \mathcal{N}(a\mu + b,\; a^2\sigma^2)Y∼N(aμ+b,a2σ2)The normal family is closed under affine maps. This is called affine equivariance.
130 cm150 cm170 cm190 cm210 cm

The content doesn't change. The ruler does.

Departures from Normality

Many classical procedures — t-tests, ANOVA, ordinary least squares regression — assume that observations or errors follow a normal distribution. In practice, this assumption is frequently violated. The consequences depend on the nature and degree of departure.

Two primary types of departure are relevant. Skewness, measured by γ1=E[(X−μ)3]/σ3\gamma_1 = E[(X-\mu)^3]/\sigma^3γ1​=E[(X−μ)3]/σ3, quantifies asymmetry. Excess kurtosis measures the weight of the tails relative to a Gaussian. The normal distribution has γ1=0\gamma_1 = 0γ1​=0 and excess kurtosis γ2=0\gamma_2 = 0γ2​=0; distributions with γ2>0\gamma_2 > 0γ2​>0 are leptokurtic (heavy-tailed), and those with γ2<0\gamma_2 < 0γ2​<0 are platykurtic.

Robustness theory formalises the sensitivity of an estimator or test to distributional misspecification. The sample mean is not robust: a single contaminating outlier can move it arbitrarily far from the population centre. The sample median, by contrast, has a breakdown point of 50%. Always assess distributional assumptions via formal diagnostics — the Shapiro-Wilk test, Q-Q plots, or moment-based tests — before applying procedures that rely on normality.

Excess Kurtosis
Formally, excess kurtosis is defined as:κ=E[(X−μ)4]σ4−3\kappa = \frac{E[(X-\mu)^4]}{\sigma^4} - 3κ=σ4E[(X−μ)4]​−3The subtraction of 3 normalises relative to the Gaussian. For κ>0\kappa > 0κ>0 (e.g. Student's ttt), probability mass is concentrated in the tails; for κ<0\kappa < 0κ<0 (e.g. uniform), the tails are lighter than Gaussian. This affects the coverage probability of confidence intervals and the Type I error rate of tests.

Robustness Test

Drag the slider to inject outliers. Watch how the Mean is dragged away instantly, while the Median resists.

MEAN: 0.00
MEDIAN: 0.00
Standard NormalContamination: 0%
PureContaminated

The Central Limit Theorem

Theorem (Central Limit Theorem). Let X1,X2,…X_1, X_2, \ldotsX1​,X2​,… be independent and identically distributed random variables with finite mean μ\muμ and finite variance $\sigma^2 > 0$. Denote the sample mean by $\bar{X}_n = n^{-1}\sum_{i=1}^n X_i$. Then, as n→∞n \to \inftyn→∞:

Xˉn−μσ/n→dN(0,1)as n→∞\frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}} \xrightarrow{d} \mathcal{N}(0, 1) \quad \text{as } n \to \inftyσ/n​Xˉn​−μ​d​N(0,1)as n→∞

Convergence here is in distribution (weak convergence of probability measures), not almost surely or in probability. The theorem does not require the underlying distribution to be symmetric, unimodal, or of any specific form — only that the first two moments are finite. This generality is its defining strength.

A direct consequence is that the sum Sn=∑i=1nXiS_n = \sum_{i=1}^n X_iSn​=∑i=1n​Xi​ is approximately N(nμ,nσ2)\mathcal{N}(n\mu, n\sigma^2)N(nμ,nσ2) for large nnn. The approximation improves with nnn and is known to be uniform over the real line (Berry-Esseen theorem). The simulation at right draws samples from the Uniform distribution — a decidedly non-Gaussian shape — and accumulates their means. The bell curve emerges regardless of the population.

CLT: Sample Means
n = 0
No samples drawn yet
Underlying: Uniform Distribution
Central Limit Theorem
n(Xˉn−μ)→dN(0,σ2)\sqrt{n}(\bar{X}_n - \mu) \xrightarrow{d} \mathcal{N}(0, \sigma^2)n​(Xˉn​−μ)d​N(0,σ2)
Xˉn−μσ/n→dN(0,1)\frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \xrightarrow{d} \mathcal{N}(0,1)σ/n​Xˉn​−μ​d​N(0,1)

Multivariate Extension

The multivariate normal distribution on Rd\mathbb{R}^dRd is specified by a mean vector μ∈Rd\boldsymbol{\mu} \in \mathbb{R}^dμ∈Rd and a symmetric positive-definite covariance matrix Σ∈Rd×d\boldsymbol{\Sigma} \in \mathbb{R}^{d \times d}Σ∈Rd×d. We write X∼Nd(μ,Σ)\mathbf{X} \sim \mathcal{N}_d(\boldsymbol{\mu}, \boldsymbol{\Sigma})X∼Nd​(μ,Σ), with density:

f(x;μ,Σ)=1(2π)d/2∣Σ∣1/2exp⁡ ⁣(−12(x−μ)⊤Σ−1(x−μ))f(\mathbf{x}; \boldsymbol{\mu}, \boldsymbol{\Sigma}) = \frac{1}{(2\pi)^{d/2}|\boldsymbol{\Sigma}|^{1/2}} \exp\!\left(-\frac{1}{2}(\mathbf{x} - \boldsymbol{\mu})^\top \boldsymbol{\Sigma}^{-1}(\mathbf{x} - \boldsymbol{\mu})\right)f(x;μ,Σ)=(2π)d/2∣Σ∣1/21​exp(−21​(x−μ)⊤Σ−1(x−μ))

The geometry of the distribution is encoded entirely in Σ\boldsymbol{\Sigma}Σ: contours of equal probability density are ellipsoids whose shape, orientation, and eccentricity are determined by the eigenstructure of Σ\boldsymbol{\Sigma}Σ. The exponent (x−μ)⊤Σ−1(x−μ)(\mathbf{x} - \boldsymbol{\mu})^\top \boldsymbol{\Sigma}^{-1}(\mathbf{x} - \boldsymbol{\mu})(x−μ)⊤Σ−1(x−μ) is the squared Mahalanobis distance from x\mathbf{x}x to μ\boldsymbol{\mu}μ, which accounts for correlations between dimensions.

The role of the correlation parameter ρ\rhoρ in the bivariate case is particularly instructive:

  • ρ=0\rho = 0ρ=0: the components are uncorrelated and, being jointly normal, statistically independent. Contours are circular (after standardisation).
  • ρ>0\rho > 0ρ>0: positive linear association; the ellipse is tilted towards the northeast diagonal.
  • ρ<0\rho < 0ρ<0: negative linear association; the ellipse is tilted towards the northwest diagonal.

A key property: any marginal distribution of a multivariate normal is itself normal, and any conditional distribution X1∣X2=x2\mathbf{X}_1 \mid \mathbf{X}_2 = \mathbf{x}_2X1​∣X2​=x2​ is normal with a mean that is a linear function of x2\mathbf{x}_2x2​. This closure under marginalisation and conditioning is what makes the Gaussian tractable in high-dimensional inference.

The Covariance Matrix
The geometry of the bivariate normal is encoded in the symmetric positive-definite matrix Σ\boldsymbol{\Sigma}Σ:Σ=(σX2ρσXσYρσXσYσY2)\boldsymbol{\Sigma} = \begin{pmatrix} \sigma_X^2 & \rho\sigma_X\sigma_Y \\ \rho\sigma_X\sigma_Y & \sigma_Y^2 \end{pmatrix}Σ=(σX2​ρσX​σY​​ρσX​σY​σY2​​)The eigenvectors of Σ\boldsymbol{\Sigma}Σ define the principal axes of the probability ellipsoid. The corresponding eigenvalues are the variances along those axes.
Bivariate Normal Surfaceρ = 0.70
DragTouch to rotate
Correlation (ρ)0.70
NegativeShape: NE tiltPositive
Z = f(X, Y) | Standard bivariate normal

Connections to Statistical Inference

The normal distribution occupies a foundational role in classical statistics because its analytical tractability allows closed-form derivations of sampling distributions. The t-distribution, F-distribution, and chi-squared distribution — central to parametric hypothesis testing — are all constructed from functions of independent normal variates.

The validity of confidence intervals, likelihood ratio tests, and ordinary least squares estimators depends on either the exact normality of residuals or on the CLT approximation for large samples. The graph below charts these structural dependencies, illustrating how extensively classical inference rests on the Gaussian assumption.

Drag nodes to explore

The Null Hypothesis.

Where rigorous statistics and fluid design converge to build the architecture of insight.

Begin Exploration

The Platform

  • The Atlas
  • Laboratory
  • Case Studies
  • Workshop

Reference

  • The Library
  • About
  • Source Code↗

Get in Touch

contact@nullhypothesis.dev

© 2026 The Null Hypothesis. All Rights Reserved.