Sampling and the Central Limit Theorem (CLT)

Sampling

A sample is one possible view of the population.

Important ideas:

Larger samples reduce noise.
Random sampling reduces bias.

Central Limit Theorem (CLT)

CLT says (informally):

For many distributions, the distribution of the sample mean becomes approximately normal as sample size grows.

This enables:

Confidence intervals
Hypothesis tests

Demonstration in Python

Even if data is not normal (e.g., exponential), the means become close to normal.

CLT demo

import numpy as np
 
rng = np.random.default_rng(42)
 
# Non-normal population
population = rng.exponential(scale=1.0, size=200_000)
 
means = []
for _ in range(5000):
    sample = rng.choice(population, size=50, replace=False)
    means.append(sample.mean())
 
means = np.array(means)
print(means.mean(), means.std())

CLT demo

import numpy as np
 
rng = np.random.default_rng(42)
 
# Non-normal population
population = rng.exponential(scale=1.0, size=200_000)
 
means = []
for _ in range(5000):
    sample = rng.choice(population, size=50, replace=False)
    means.append(sample.mean())
 
means = np.array(means)
print(means.mean(), means.std())

Standard error

The standard error of the mean is:

[ SE(\bar{x}) = \frac{s}{\sqrt{n}} ]

Bigger n → smaller SE

Practical guidance

Use bootstrap (resampling) if formulas are hard or assumptions unclear.

If this helped you, consider buying me a coffee ☕

Buy me a coffee