Skip to content

K-Nearest Neighbors (KNN)

What KNN does

KNN predicts a label by looking at the k closest training points.

false


  flowchart LR
  N[New point] --> D[Compute distances to training points]
  D --> K[Pick k nearest]
  K --> V[Vote / majority class]

false

How to choose k

  • small k โ†’ more sensitive to noise (high variance)
  • large k โ†’ smoother boundary (high bias)

Scaling is critical

KNN uses distances (Euclidean by default).

If one feature has a bigger scale, it dominates.

Use StandardScalerStandardScaler.

Scikit-learn example

KNN classifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
 
knn = Pipeline(
    steps=[
        ("scaler", StandardScaler()),
        ("model", KNeighborsClassifier(n_neighbors=5)),
    ]
)
KNN classifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
 
knn = Pipeline(
    steps=[
        ("scaler", StandardScaler()),
        ("model", KNeighborsClassifier(n_neighbors=5)),
    ]
)

When KNN is a good baseline

  • small to medium datasets
  • when decision boundary is not too complex

Mini-checkpoint

Try k = 3, 5, 11 and compare validation performance.

๐Ÿงช Try It Yourself

Exercise 1 โ€“ Train-Test Split

Exercise 2 โ€“ Fit a Linear Model

Exercise 3 โ€“ Evaluate with MSE

If this helped you, consider buying me a coffee โ˜•

Buy me a coffee

Was this page helpful?

Let us know how we did