The Methodology | The Null Hypothesis

Data collection is not just gathering numbers—it's deciding which numbers have the right to speak for the whole.

Before analyzing anything, you have to decide who you are asking. The difference between the theoretical population you want to study and the actual list of people you can reach introduces the first layer of bias into any statistical analysis. From there, your choice of sampling method will determine whether your findings are a reflection of reality or just an expensive mistake.

Interactive 05The Target vs. The Frame

You want to study the "Population" (e.g., all car owners). But you can only actually sample from a "Frame" (e.g., a registry of licensed vehicles). If your frame doesn't match your population, your sample is fundamentally broken before you even start.

Drag the frame

Target
Population

Sampling
Frame

Target Population vs. Sampling Frame

Drag the Sampling Frame circle over the Target Population. In theory, they should match perfectly. In reality, they rarely align.

Notice how items inside the population but outside the frame have zero chance of being selected.

Interactive 06The Sampling Lottery

Simple Random Sampling (SRS) ensures every individual has an equal probability of selection. It is the gold standard for unbiased data collection, removing human preference from the equation entirely.

Population

N = 200 | $\mu$ = 0.00

Population Size200

Sample Size10

Sampling Distribution

Samples drawn: 0

Draw a sample to begin

Watch how random selection draws from across the entire population space without bias.

Interactive 07Advanced Architecture

When populations are huge or contain critical minority groups, SRS is too chaotic. Stratified sampling guarantees representation by splitting the population first, while Systematic sampling offers administrative simplicity through periodic selection.

Population (N)30

Sample Size (n)5

Every member has an equal chance of being selected.

Total Sampled:

0 / 30

Selection Probability

P(\text{Select}) = \frac{n}{N} = \frac{5}{30} = 0.17

Stratified sampling forces the inclusion of specific groups. Systematic sampling creates an even spread.

Interactive 08The Price of Guessing

Sampling Error is the natural, mathematically unavoidable discrepancy between your sample and the true population. Non-Sampling Error is everything else: typos, bad questions, and flawed lists. You can reduce Sampling Error by asking more people. Non-Sampling Error is forever.

True Population Mean (

\mu

)50

Sample Size (n)10

Higher sample size reduces sampling error.

Inject Non-Sampling ErrorSimulates a systematic bias (e.g. faulty instrument). Notice how increasing sample size does not fix this error.

100500

True Mean: 50

Draw samples to plot distribution...

Sampling Error

Difference due to chance.

Non-Sampling Error

Systematic bias in collection.

Total Error Decomposition

\text{Sampling Error} = \bar{x} - \mu

No matter how perfectly you sample, your statistic will almost never exactly match the true parameter.

Act Progress2 / 7

Unit 2: The Methodology

NEXT Unit 3: Describing Data

"You've collected your sample without bias. Now, how do you organize thousands of raw data points into a shape you can actually understand?"

Previous Continue