TABLE OF CONTENTS

Atlas

VOL. IMAR 2026

The Normal Distribution

The Gaussian distribution, or normal distribution, is a continuous probability distribution on $\mathbb{R}$ characterised by two parameters: a location parameter $\mu \in \mathbb{R}$ and a scale parameter $\sigma^2 > 0$ . Its probability density function is:

f(x;\, \mu, \sigma^2) = \frac{1}{\sigma\sqrt{2\pi}} \exp\!\left(-\frac{(x - \mu)^2}{2\sigma^2}\right), \quad x \in \mathbb{R}

The distribution arises as the limiting form of the normalised sum of independent, identically distributed random variables with finite variance — a consequence of the Central Limit Theorem. This emergence from error accumulation processes explains its prevalence in measurement theory, physical modelling, and statistical inference.

Analytically, the Gaussian family is closed under both affine transformation and convolution: any affine function of a normal variable is normal, and the sum of independent normals is normal. This stability under convolution makes it the unique attractor in the sense formalised by the CLT.

Errors Accumulated

Sum of Independent Random Errors → Normal Distribution

Probability Density and Measure

For any continuous random variable, point probabilities are identically zero: $P(X = x) = 0$ for every $x \in \mathbb{R}$ . This is a consequence of the distribution being absolutely continuous with respect to Lebesgue measure $\lambda$ on $\mathbb{R}$ .

The probability density function $f$ is the Radon-Nikodym derivative of the probability measure $P$ with respect to $\lambda$ . Probability is recovered by integration over any measurable set $A \in \mathcal{B}(\mathbb{R})$ :

P(X \in A) = \int_A f(x)\,d\lambda(x)

The value $f(x)$ is a density, not a probability. It is not bounded above by 1 — its only constraint is $\int_{-\infty}^{\infty} f(x)\,dx = 1$ . Probability accumulates only over intervals, not at points.

Corollary — Empirical Rule

For $X \sim \mathcal{N}(\mu, \sigma^2)$ , the following probabilities follow directly from $\Phi$ :

P(\mu - \sigma \leq X \leq \mu + \sigma)

= 2\Phi(1) - 1 \approx 68.27\%

P(\mu - 2\sigma \leq X \leq \mu + 2\sigma)

= 2\Phi(2) - 1 \approx 95.45\%

P(\mu - 3\sigma \leq X \leq \mu + 3\sigma)

= 2\Phi(3) - 1 \approx 99.73\%

These three intervals account for the near-totality of mass — a direct consequence of the rapid exponential decay of the Gaussian tails.

Click and dragTap and drag to select an interval

Select a range to calculate probability

The Parametric Family

The normal family is parametrised by two quantities: the mean $\mu \in \mathbb{R}$ , which determines the location of the distribution, and the variance $\sigma^2 > 0$ , which controls its dispersion. We write $X \sim \mathcal{N}(\mu, \sigma^2)$ to denote that the random variable $X$ has density:

f(x; \mu, \sigma^2) = \frac{1}{\sigma\sqrt{2\pi}} \exp\!\left(-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2\right), \quad x \in \mathbb{R}

Every member of the family is an affine image of the standard normal $\mathcal{N}(0, 1)$ : if $Z \sim \mathcal{N}(0, 1)$ , then $X = \mu + \sigma Z \sim \mathcal{N}(\mu, \sigma^2)$ . Equivalently, the entire two-parameter family is generated by translations and scalings of a single canonical form. This structure defines a location-scale family.

The density achieves its unique maximum at $x = \mu$ , where $f(\mu) = (\sigma\sqrt{2\pi})^{-1}$ . Inflection points occur at $x = \mu \pm \sigma$ . All odd central moments vanish, and the even central moments satisfy $E[(X - \mu)^{2k}] = (2k-1)!! \cdot \sigma^{2k}$ for non-negative integers $k$ , where $(2k-1)!! = 1 \cdot 3 \cdots (2k-1)$ is the double factorial.

The Living Shape

Drag sliders to reshape the family.

N(μ, σ²)
N(0.0, 1.0²)

Mean (μ)

0.0

Center of gravity

Spread (σ)

1.0

Width of errors

Change of Scale: Standardisation

Let $X \sim \mathcal{N}(\mu, \sigma^2)$ . The standardised variable is defined as:

Z = \frac{X - \mu}{\sigma} \sim \mathcal{N}(0, 1)

This is an affine transformation, not a change of distributional shape. Subtracting $\mu$ translates the distribution to the origin; dividing by $\sigma$ rescales the horizontal axis so that one unit corresponds to one standard deviation. The resulting variable $Z$ always follows the standard normal $\mathcal{N}(0, 1)$ , regardless of the original parameters.

The practical consequence is that probability calculations for any member of the normal family reduce to a single reference: the standard normal CDF $\Phi(z)$ . For any $a, b \in \mathbb{R}$ :

P(a \leq X \leq b) = \Phi\!\left(\frac{b - \mu}{\sigma}\right) - \Phi\!\left(\frac{a - \mu}{\sigma}\right)

130 cm150 cm170 cm190 cm210 cm

The content doesn't change. The ruler does.

Departures from Normality

Many classical procedures — t-tests, ANOVA, ordinary least squares regression — assume that observations or errors follow a normal distribution. In practice, this assumption is frequently violated. The consequences depend on the nature and degree of departure.

Two primary types of departure are relevant. Skewness, measured by $\gamma_1 = E[(X-\mu)^3]/\sigma^3$ , quantifies asymmetry. Excess kurtosis measures the weight of the tails relative to a Gaussian. The normal distribution has $\gamma_1 = 0$ and excess kurtosis $\gamma_2 = 0$ ; distributions with $\gamma_2 > 0$ are leptokurtic (heavy-tailed), and those with $\gamma_2 < 0$ are platykurtic.

Robustness theory formalises the sensitivity of an estimator or test to distributional misspecification. The sample mean is not robust: a single contaminating outlier can move it arbitrarily far from the population centre. The sample median, by contrast, has a breakdown point of 50%. Always assess distributional assumptions via formal diagnostics — the Shapiro-Wilk test, Q-Q plots, or moment-based tests — before applying procedures that rely on normality.

Robustness Test

Drag the slider to inject outliers. Watch how the Mean is dragged away instantly, while the Median resists.

MEAN: 0.00

MEDIAN: 0.00

Standard NormalContamination: 0%

PureContaminated

The Central Limit Theorem

Theorem (Central Limit Theorem). Let $X_1, X_2, \ldots$ be independent and identically distributed random variables with finite mean $\mu$ and finite variance $\sigma^2 > 0$. Denote the sample mean by $\bar{X}_n = n^{-1}\sum_{i=1}^n X_i$. Then, as $n \to \infty$ :

\frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}} \xrightarrow{d} \mathcal{N}(0, 1) \quad \text{as } n \to \infty

Convergence here is in distribution (weak convergence of probability measures), not almost surely or in probability. The theorem does not require the underlying distribution to be symmetric, unimodal, or of any specific form — only that the first two moments are finite. This generality is its defining strength.

A direct consequence is that the sum $S_n = \sum_{i=1}^n X_i$ is approximately $\mathcal{N}(n\mu, n\sigma^2)$ for large $n$ . The approximation improves with $n$ and is known to be uniform over the real line (Berry-Esseen theorem). The simulation at right draws samples from the Uniform distribution — a decidedly non-Gaussian shape — and accumulates their means. The bell curve emerges regardless of the population.

CLT: Sample Means

n = 0

No samples drawn yet

Underlying: Uniform Distribution

Central Limit Theorem

\sqrt{n}(\bar{X}_n - \mu) \xrightarrow{d} \mathcal{N}(0, \sigma^2)

\frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \xrightarrow{d} \mathcal{N}(0,1)

Multivariate Extension

The multivariate normal distribution on $\mathbb{R}^d$ is specified by a mean vector $\boldsymbol{\mu} \in \mathbb{R}^d$ and a symmetric positive-definite covariance matrix $\boldsymbol{\Sigma} \in \mathbb{R}^{d \times d}$ . We write $\mathbf{X} \sim \mathcal{N}_d(\boldsymbol{\mu}, \boldsymbol{\Sigma})$ , with density:

f(\mathbf{x}; \boldsymbol{\mu}, \boldsymbol{\Sigma}) = \frac{1}{(2\pi)^{d/2}|\boldsymbol{\Sigma}|^{1/2}} \exp\!\left(-\frac{1}{2}(\mathbf{x} - \boldsymbol{\mu})^\top \boldsymbol{\Sigma}^{-1}(\mathbf{x} - \boldsymbol{\mu})\right)

The geometry of the distribution is encoded entirely in $\boldsymbol{\Sigma}$ : contours of equal probability density are ellipsoids whose shape, orientation, and eccentricity are determined by the eigenstructure of $\boldsymbol{\Sigma}$ . The exponent $(\mathbf{x} - \boldsymbol{\mu})^\top \boldsymbol{\Sigma}^{-1}(\mathbf{x} - \boldsymbol{\mu})$ is the squared Mahalanobis distance from $\mathbf{x}$ to $\boldsymbol{\mu}$ , which accounts for correlations between dimensions.

The role of the correlation parameter $\rho$ in the bivariate case is particularly instructive:

$\rho = 0$ : the components are uncorrelated and, being jointly normal, statistically independent. Contours are circular (after standardisation).
$\rho > 0$ : positive linear association; the ellipse is tilted towards the northeast diagonal.
$\rho < 0$ : negative linear association; the ellipse is tilted towards the northwest diagonal.

A key property: any marginal distribution of a multivariate normal is itself normal, and any conditional distribution $\mathbf{X}_1 \mid \mathbf{X}_2 = \mathbf{x}_2$ is normal with a mean that is a linear function of $\mathbf{x}_2$ . This closure under marginalisation and conditioning is what makes the Gaussian tractable in high-dimensional inference.

Bivariate Normal Surfaceρ = 0.70

DragTouch to rotate

Correlation (ρ)0.70

NegativeShape: NE tiltPositive

Z = f(X, Y) | Standard bivariate normal

Connections to Statistical Inference

The normal distribution occupies a foundational role in classical statistics because its analytical tractability allows closed-form derivations of sampling distributions. The t-distribution, F-distribution, and chi-squared distribution — central to parametric hypothesis testing — are all constructed from functions of independent normal variates.

The validity of confidence intervals, likelihood ratio tests, and ordinary least squares estimators depends on either the exact normality of residuals or on the CLT approximation for large samples. The graph below charts these structural dependencies, illustrating how extensively classical inference rests on the Gaussian assumption.

Drag nodes to explore