Bagging - Random Forest Regressor/Classifier
What bagging is
Bagging = Bootstrap Aggregating.
Steps:
- sample (with replacement) multiple datasets from the training set
- train one model per sample
- aggregate predictions (vote/average)
false
flowchart TD D[Training data] --> S1[Bootstrap sample 1] D --> S2[Bootstrap sample 2] D --> S3[Bootstrap sample 3] S1 --> T1[Tree 1] S2 --> T2[Tree 2] S3 --> T3[Tree 3] T1 --> A[Aggregate] T2 --> A T3 --> A A --> P[Final prediction]
false
Random Forest in one line
A Random Forest is bagging + decision trees + random feature selection at each split.
This randomness increases diversity → improves generalization.
Classification vs regression
- RandomForestClassifier: majority vote
- RandomForestRegressor: average prediction
Scikit-learn examples
Random Forest (classification)
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(
n_estimators=200,
max_depth=None,
random_state=42,
n_jobs=-1,
)Random Forest (classification)
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(
n_estimators=200,
max_depth=None,
random_state=42,
n_jobs=-1,
)Random Forest (regression)
from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor(
n_estimators=300,
random_state=42,
n_jobs=-1,
)Random Forest (regression)
from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor(
n_estimators=300,
random_state=42,
n_jobs=-1,
)Useful hyperparameters
n_estimatorsn_estimators: more trees → better but slowermax_depthmax_depth: limits overfittingmin_samples_leafmin_samples_leaf: smooths leavesmax_featuresmax_features: controls feature randomness
Mini-checkpoint
First try:
- deep trees in forest
- tune
max_depthmax_depthandmin_samples_leafmin_samples_leafif overfitting
If this helped you, consider buying me a coffee ☕
Buy me a coffeeWas this page helpful?
Let us know how we did
