Gradient Boosting (XGBoost, LightGBM, CatBoost)

What gradient boosting is

Gradient boosting builds an additive model:

prediction = tree1 + tree2 + tree3 + ...prediction = tree1 + tree2 + tree3 + ...

Each new tree is trained to reduce the previous model’s errors.

For regression, it often fits the residuals.

false

  flowchart TD
  A[Initial prediction] --> B[Compute residuals]
  B --> C[Fit small tree on residuals]
  C --> D[Add to model]
  D --> B

Gradient boosting is great for:

Scikit-learn has strong built-ins:

sklearn HistGradientBoosting

from sklearn.ensemble import HistGradientBoostingClassifier
 
gb = HistGradientBoostingClassifier(random_state=42)

sklearn HistGradientBoosting

from sklearn.ensemble import HistGradientBoostingClassifier
 
gb = HistGradientBoostingClassifier(random_state=42)

These external libraries are commonly used in industry, but they’re optional. You can learn the concepts using scikit-learn first.

If your data is tabular and you want a strong model quickly:

If this helped you, consider buying me a coffee ☕