Skip to content

Stacking and Voting Classifiers

Voting ensembles

Voting combines multiple models:

  • hard voting: majority vote
  • soft voting: average predicted probabilities

false


  flowchart LR
  A[Model A] --> V[Voting]
  B[Model B] --> V
  C[Model C] --> V
  V --> P[Final prediction]

false

Soft voting usually works better if models output calibrated probabilities.

Stacking ensembles

Stacking trains a โ€œmeta-modelโ€ that learns how to combine base model predictions.

false


  flowchart TD
  X[Features] --> M1[Base Model 1]
  X --> M2[Base Model 2]
  X --> M3[Base Model 3]
  M1 --> Z[Meta-features (predictions)]
  M2 --> Z
  M3 --> Z
  Z --> META[Meta-model]
  META --> Y[Final prediction]

false

Key warning:

  • stacking must be done carefully with cross-validation to avoid leakage

Scikit-learn examples

VotingClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
 
voter = VotingClassifier(
    estimators=[
        ("lr", LogisticRegression(max_iter=1000)),
        ("svm", SVC(probability=True)),
        ("rf", RandomForestClassifier(n_estimators=200)),
    ],
    voting="soft",
)
VotingClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
 
voter = VotingClassifier(
    estimators=[
        ("lr", LogisticRegression(max_iter=1000)),
        ("svm", SVC(probability=True)),
        ("rf", RandomForestClassifier(n_estimators=200)),
    ],
    voting="soft",
)
StackingClassifier
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
 
stack = StackingClassifier(
    estimators=[
        ("rf", RandomForestClassifier(n_estimators=200)),
        ("svm", SVC(probability=True)),
    ],
    final_estimator=LogisticRegression(max_iter=1000),
)
StackingClassifier
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
 
stack = StackingClassifier(
    estimators=[
        ("rf", RandomForestClassifier(n_estimators=200)),
        ("svm", SVC(probability=True)),
    ],
    final_estimator=LogisticRegression(max_iter=1000),
)

Mini-checkpoint

Try a voting classifier with:

  • one linear model
  • one non-linear model
  • one tree-based model

Compare against each base model.

If this helped you, consider buying me a coffee โ˜•

Buy me a coffee

Was this page helpful?

Let us know how we did