Data collection is not just gathering numbers—it's deciding which numbers have the right to speak for the whole.
Before analyzing anything, you have to decide who you are asking. The difference between the theoretical population you want to study and the actual list of people you can reach introduces the first layer of bias into any statistical analysis. From there, your choice of sampling method will determine whether your findings are a reflection of reality or just an expensive mistake.
You want to study the "Population" (e.g., all car owners). But you can only actually sample from a "Frame" (e.g., a registry of licensed vehicles). If your frame doesn't match your population, your sample is fundamentally broken before you even start.
Drag the frame
Population
Frame
Target Population vs. Sampling Frame
Drag the Sampling Frame circle over the Target Population. In theory, they should match perfectly. In reality, they rarely align.
Notice how items inside the population but outside the frame have zero chance of being selected.
Simple Random Sampling (SRS) ensures every individual has an equal probability of selection. It is the gold standard for unbiased data collection, removing human preference from the equation entirely.
Population
N = 200 | = 0.00
Sampling Distribution
Samples drawn: 0
Watch how random selection draws from across the entire population space without bias.
When populations are huge or contain critical minority groups, SRS is too chaotic. Stratified sampling guarantees representation by splitting the population first, while Systematic sampling offers administrative simplicity through periodic selection.
Every member has an equal chance of being selected.
Stratified sampling forces the inclusion of specific groups. Systematic sampling creates an even spread.
Sampling Error is the natural, mathematically unavoidable discrepancy between your sample and the true population. Non-Sampling Error is everything else: typos, bad questions, and flawed lists. You can reduce Sampling Error by asking more people. Non-Sampling Error is forever.
Higher sample size reduces sampling error.
Sampling Error
Difference due to chance.
Non-Sampling Error
Systematic bias in collection.
No matter how perfectly you sample, your statistic will almost never exactly match the true parameter.