The Power of Ensembles - Why Combine Models?
The intuition: many opinions beat one
If you ask one person for a guess, you can get a bad answer.
If you ask 100 people and average their guesses, the result is often better.
Ensembles do the same for models.
Why ensembles work
Ensembles improve performance through:
1) Variance reduction (bagging)
- high-variance models (like deep decision trees) can overfit
- averaging many trees reduces sensitivity to noise
2) Bias reduction (boosting)
- weak learners can underfit
- boosting adds learners that fix previous errors
3) Better decision boundaries (model diversity)
If models make different mistakes, combining helps.
false
flowchart LR M1[Model 1] --> C[Combine] M2[Model 2] --> C M3[Model 3] --> C C --> P[Better prediction]
false
What “diversity” means
Models should not all make the same mistakes.
How diversity is created:
- different subsamples of data (bagging)
- different feature subsets (random forests)
- sequential focus on errors (boosting)
The tradeoff
Ensembles can be:
- less interpretable
- heavier (more compute)
But on many tabular problems, they are the strongest first choice.
Mini-checkpoint
Which seems more likely to generalize?
- one deep tree
- 200 trees averaged
(Usually the averaged forest.)
If this helped you, consider buying me a coffee ☕
Buy me a coffeeWas this page helpful?
Let us know how we did
