K-Fold Cross-Validation
Why cross-validation
A single train/validation split can be noisy.
Cross-validation averages performance across multiple splits.
K-fold CV
Steps:
- split data into K folds
- train on K-1 folds
- validate on the remaining fold
- repeat for all folds
- average the scores
false
flowchart TD A[Data] --> B[Split into K folds] B --> C[Train on K-1] C --> D[Validate on 1] D --> E[Repeat K times] E --> F[Average score]
false
Scikit-learn example
cross_val_score
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
scores = cross_val_score(LogisticRegression(max_iter=1000), X, y, cv=5, scoring="accuracy")
print("mean:", scores.mean())
print("std:", scores.std())cross_val_score
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
scores = cross_val_score(LogisticRegression(max_iter=1000), X, y, cv=5, scoring="accuracy")
print("mean:", scores.mean())
print("std:", scores.std())Stratified CV
For classification, use stratified splits so each fold preserves class ratios.
Scikit-learn does this automatically for many classifiers.
Time series warning
Donβt use random k-fold for time series.
Use time-series split.
Mini-checkpoint
When data is small:
- prefer CV over one split.
π§ͺ Try It Yourself
Exercise 1 β Train-Test Split
Exercise 2 β Fit a Linear Model
Exercise 3 β Evaluate with MSE
If this helped you, consider buying me a coffee β
Buy me a coffeeWas this page helpful?
Let us know how we did
