Bagging - Random Forest Regressor/Classifier

What bagging is

Bagging = Bootstrap Aggregating.

Steps:

sample (with replacement) multiple datasets from the training set
train one model per sample
aggregate predictions (vote/average)

false

  flowchart TD
  D[Training data] --> S1[Bootstrap sample 1]
  D --> S2[Bootstrap sample 2]
  D --> S3[Bootstrap sample 3]
  S1 --> T1[Tree 1]
  S2 --> T2[Tree 2]
  S3 --> T3[Tree 3]
  T1 --> A[Aggregate]
  T2 --> A
  T3 --> A
  A --> P[Final prediction]

false

Random Forest in one line

A Random Forest is bagging + decision trees + random feature selection at each split.

This randomness increases diversity → improves generalization.

Classification vs regression

RandomForestClassifier: majority vote
RandomForestRegressor: average prediction

Scikit-learn examples

Random Forest (classification)

from sklearn.ensemble import RandomForestClassifier
 
rf = RandomForestClassifier(
    n_estimators=200,
    max_depth=None,
    random_state=42,
    n_jobs=-1,
)

Random Forest (classification)

from sklearn.ensemble import RandomForestClassifier
 
rf = RandomForestClassifier(
    n_estimators=200,
    max_depth=None,
    random_state=42,
    n_jobs=-1,
)

Random Forest (regression)

from sklearn.ensemble import RandomForestRegressor
 
rf = RandomForestRegressor(
    n_estimators=300,
    random_state=42,
    n_jobs=-1,
)

Random Forest (regression)

from sklearn.ensemble import RandomForestRegressor
 
rf = RandomForestRegressor(
    n_estimators=300,
    random_state=42,
    n_jobs=-1,
)

Useful hyperparameters

n_estimatorsn_estimators: more trees → better but slower
max_depthmax_depth: limits overfitting
min_samples_leafmin_samples_leaf: smooths leaves
max_featuresmax_features: controls feature randomness

Mini-checkpoint

First try:

deep trees in forest
tune max_depthmax_depth and min_samples_leafmin_samples_leaf if overfitting

If this helped you, consider buying me a coffee ☕

Buy me a coffee

Bagging - Random Forest Regressor/Classifier

What bagging is

false

Random Forest in one line

Classification vs regression

Scikit-learn examples

Useful hyperparameters

Mini-checkpoint

Was this page helpful?