Sampling and the Central Limit Theorem (CLT)
Sampling
A sample is one possible view of the population.
Important ideas:
- Larger samples reduce noise.
- Random sampling reduces bias.
Central Limit Theorem (CLT)
CLT says (informally):
For many distributions, the distribution of the sample mean becomes approximately normal as sample size grows.
This enables:
- Confidence intervals
- Hypothesis tests
Demonstration in Python
Even if data is not normal (e.g., exponential), the means become close to normal.
CLT demo
import numpy as np
rng = np.random.default_rng(42)
# Non-normal population
population = rng.exponential(scale=1.0, size=200_000)
means = []
for _ in range(5000):
sample = rng.choice(population, size=50, replace=False)
means.append(sample.mean())
means = np.array(means)
print(means.mean(), means.std())CLT demo
import numpy as np
rng = np.random.default_rng(42)
# Non-normal population
population = rng.exponential(scale=1.0, size=200_000)
means = []
for _ in range(5000):
sample = rng.choice(population, size=50, replace=False)
means.append(sample.mean())
means = np.array(means)
print(means.mean(), means.std())Standard error
The standard error of the mean is:
[ SE(\bar{x}) = \frac{s}{\sqrt{n}} ]
- Bigger n → smaller SE
Practical guidance
- Use bootstrap (resampling) if formulas are hard or assumptions unclear.
If this helped you, consider buying me a coffee ☕
Buy me a coffeeWas this page helpful?
Let us know how we did
