The Normal Distribution
The Gaussian distribution, or normal distribution, is a continuous probability distribution on characterised by two parameters: a location parameter and a scale parameter . Its probability density function is:
The distribution arises as the limiting form of the normalised sum of independent, identically distributed random variables with finite variance — a consequence of the Central Limit Theorem. This emergence from error accumulation processes explains its prevalence in measurement theory, physical modelling, and statistical inference.
Analytically, the Gaussian family is closed under both affine transformation and convolution: any affine function of a normal variable is normal, and the sum of independent normals is normal. This stability under convolution makes it the unique attractor in the sense formalised by the CLT.
Probability Density and Measure
For any continuous random variable, point probabilities are identically zero: for every . This is a consequence of the distribution being absolutely continuous with respect to Lebesgue measure on .
The probability density function is the Radon-Nikodym derivative of the probability measure with respect to . Probability is recovered by integration over any measurable set :
The value is a density, not a probability. It is not bounded above by 1 — its only constraint is . Probability accumulates only over intervals, not at points.
For , the following probabilities follow directly from :
These three intervals account for the near-totality of mass — a direct consequence of the rapid exponential decay of the Gaussian tails.
The Parametric Family
The normal family is parametrised by two quantities: the mean , which determines the location of the distribution, and the variance , which controls its dispersion. We write to denote that the random variable has density:
Every member of the family is an affine image of the standard normal : if , then . Equivalently, the entire two-parameter family is generated by translations and scalings of a single canonical form. This structure defines a location-scale family.
The density achieves its unique maximum at , where . Inflection points occur at . All odd central moments vanish, and the even central moments satisfy for non-negative integers , where is the double factorial.
The Living Shape
Drag sliders to reshape the family.
N(0.0, 1.0²)
Change of Scale: Standardisation
Let . The standardised variable is defined as:
This is an affine transformation, not a change of distributional shape. Subtracting translates the distribution to the origin; dividing by rescales the horizontal axis so that one unit corresponds to one standard deviation. The resulting variable always follows the standard normal , regardless of the original parameters.
The practical consequence is that probability calculations for any member of the normal family reduce to a single reference: the standard normal CDF . For any :
The content doesn't change. The ruler does.
Departures from Normality
Many classical procedures — t-tests, ANOVA, ordinary least squares regression — assume that observations or errors follow a normal distribution. In practice, this assumption is frequently violated. The consequences depend on the nature and degree of departure.
Two primary types of departure are relevant. Skewness, measured by , quantifies asymmetry. Excess kurtosis measures the weight of the tails relative to a Gaussian. The normal distribution has and excess kurtosis ; distributions with are leptokurtic (heavy-tailed), and those with are platykurtic.
Robustness theory formalises the sensitivity of an estimator or test to distributional misspecification. The sample mean is not robust: a single contaminating outlier can move it arbitrarily far from the population centre. The sample median, by contrast, has a breakdown point of 50%. Always assess distributional assumptions via formal diagnostics — the Shapiro-Wilk test, Q-Q plots, or moment-based tests — before applying procedures that rely on normality.
Robustness Test
Drag the slider to inject outliers. Watch how the Mean is dragged away instantly, while the Median resists.
The Central Limit Theorem
Theorem (Central Limit Theorem). Let be independent and identically distributed random variables with finite mean and finite variance $\sigma^2 > 0$. Denote the sample mean by $\bar{X}_n = n^{-1}\sum_{i=1}^n X_i$. Then, as :
Convergence here is in distribution (weak convergence of probability measures), not almost surely or in probability. The theorem does not require the underlying distribution to be symmetric, unimodal, or of any specific form — only that the first two moments are finite. This generality is its defining strength.
A direct consequence is that the sum is approximately for large . The approximation improves with and is known to be uniform over the real line (Berry-Esseen theorem). The simulation at right draws samples from the Uniform distribution — a decidedly non-Gaussian shape — and accumulates their means. The bell curve emerges regardless of the population.
Multivariate Extension
The multivariate normal distribution on is specified by a mean vector and a symmetric positive-definite covariance matrix . We write , with density:
The geometry of the distribution is encoded entirely in : contours of equal probability density are ellipsoids whose shape, orientation, and eccentricity are determined by the eigenstructure of . The exponent is the squared Mahalanobis distance from to , which accounts for correlations between dimensions.
The role of the correlation parameter in the bivariate case is particularly instructive:
- : the components are uncorrelated and, being jointly normal, statistically independent. Contours are circular (after standardisation).
- : positive linear association; the ellipse is tilted towards the northeast diagonal.
- : negative linear association; the ellipse is tilted towards the northwest diagonal.
A key property: any marginal distribution of a multivariate normal is itself normal, and any conditional distribution is normal with a mean that is a linear function of . This closure under marginalisation and conditioning is what makes the Gaussian tractable in high-dimensional inference.
Connections to Statistical Inference
The normal distribution occupies a foundational role in classical statistics because its analytical tractability allows closed-form derivations of sampling distributions. The t-distribution, F-distribution, and chi-squared distribution — central to parametric hypothesis testing — are all constructed from functions of independent normal variates.
The validity of confidence intervals, likelihood ratio tests, and ordinary least squares estimators depends on either the exact normality of residuals or on the CLT approximation for large samples. The graph below charts these structural dependencies, illustrating how extensively classical inference rests on the Gaussian assumption.
Drag nodes to explore