K-Nearest Neighbors (KNN)
What KNN does
KNN predicts a label by looking at the k closest training points.
false
flowchart LR N[New point] --> D[Compute distances to training points] D --> K[Pick k nearest] K --> V[Vote / majority class]
false
How to choose k
- small k โ more sensitive to noise (high variance)
- large k โ smoother boundary (high bias)
Scaling is critical
KNN uses distances (Euclidean by default).
If one feature has a bigger scale, it dominates.
Use StandardScalerStandardScaler.
Scikit-learn example
KNN classifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
knn = Pipeline(
steps=[
("scaler", StandardScaler()),
("model", KNeighborsClassifier(n_neighbors=5)),
]
)KNN classifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
knn = Pipeline(
steps=[
("scaler", StandardScaler()),
("model", KNeighborsClassifier(n_neighbors=5)),
]
)When KNN is a good baseline
- small to medium datasets
- when decision boundary is not too complex
Mini-checkpoint
Try k = 3, 5, 11 and compare validation performance.
๐งช Try It Yourself
Exercise 1 โ Train-Test Split
Exercise 2 โ Fit a Linear Model
Exercise 3 โ Evaluate with MSE
If this helped you, consider buying me a coffee โ
Buy me a coffeeWas this page helpful?
Let us know how we did
