Skip to content

Regularization - Ridge and Lasso Regression

Why regularization exists

Regression models can overfit when:

  • there are many features
  • features are noisy
  • polynomial degree is high

Overfitting often shows up as:

  • great training error
  • bad validation/test error

Regularization adds a penalty that discourages overly complex solutions.

Ridge Regression (L2)

Ridge minimizes:

MSE + λ * Σ(wi²)MSE + λ * Σ(wi²)

Effect:

  • shrinks coefficients toward 0
  • usually keeps all features (rarely exactly 0)

Lasso Regression (L1)

Lasso minimizes:

MSE + λ * Σ(|wi|)MSE + λ * Σ(|wi|)

Effect:

  • can push some coefficients exactly to 0
  • performs feature selection

false


  flowchart LR
  A[Linear Regression] --> B[Ridge: shrink weights]
  A --> C[Lasso: shrink + select]

false

Scikit-learn examples

Ridge and Lasso
from sklearn.linear_model import Ridge, Lasso
 
ridge = Ridge(alpha=1.0)  # alpha is λ
lasso = Lasso(alpha=0.1)
Ridge and Lasso
from sklearn.linear_model import Ridge, Lasso
 
ridge = Ridge(alpha=1.0)  # alpha is λ
lasso = Lasso(alpha=0.1)

Important: scale features

Regularization is sensitive to feature scale.

Use StandardScalerStandardScaler in a pipeline.

Choosing λ (alpha)

  • use validation or cross-validation
  • RidgeCVRidgeCV / LassoCVLassoCV can help

Mini-checkpoint

If you have 1000 features:

  • which regularization might help reduce features automatically?

(Usually Lasso.)

If this helped you, consider buying me a coffee ☕

Buy me a coffee

Was this page helpful?

Let us know how we did