Regularization - Ridge and Lasso Regression
Why regularization exists
Regression models can overfit when:
- there are many features
- features are noisy
- polynomial degree is high
Overfitting often shows up as:
- great training error
- bad validation/test error
Regularization adds a penalty that discourages overly complex solutions.
Ridge Regression (L2)
Ridge minimizes:
MSE + λ * Σ(wi²)MSE + λ * Σ(wi²)
Effect:
- shrinks coefficients toward 0
- usually keeps all features (rarely exactly 0)
Lasso Regression (L1)
Lasso minimizes:
MSE + λ * Σ(|wi|)MSE + λ * Σ(|wi|)
Effect:
- can push some coefficients exactly to 0
- performs feature selection
false
flowchart LR A[Linear Regression] --> B[Ridge: shrink weights] A --> C[Lasso: shrink + select]
false
Scikit-learn examples
Ridge and Lasso
from sklearn.linear_model import Ridge, Lasso
ridge = Ridge(alpha=1.0) # alpha is λ
lasso = Lasso(alpha=0.1)Ridge and Lasso
from sklearn.linear_model import Ridge, Lasso
ridge = Ridge(alpha=1.0) # alpha is λ
lasso = Lasso(alpha=0.1)Important: scale features
Regularization is sensitive to feature scale.
Use StandardScalerStandardScaler in a pipeline.
Choosing λ (alpha)
- use validation or cross-validation
RidgeCVRidgeCV/LassoCVLassoCVcan help
Mini-checkpoint
If you have 1000 features:
- which regularization might help reduce features automatically?
(Usually Lasso.)
If this helped you, consider buying me a coffee ☕
Buy me a coffeeWas this page helpful?
Let us know how we did
