K-Nearest Neighbors (KNN)

What KNN does

KNN predicts a label by looking at the k closest training points.

false

  flowchart LR
  N[New point] --> D[Compute distances to training points]
  D --> K[Pick k nearest]
  K --> V[Vote / majority class]

false

How to choose k

small k → more sensitive to noise (high variance)
large k → smoother boundary (high bias)

Scaling is critical

KNN uses distances (Euclidean by default).

If one feature has a bigger scale, it dominates.

Use StandardScalerStandardScaler.

Scikit-learn example

KNN classifier

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
 
knn = Pipeline(
    steps=[
        ("scaler", StandardScaler()),
        ("model", KNeighborsClassifier(n_neighbors=5)),
    ]
)

KNN classifier

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
 
knn = Pipeline(
    steps=[
        ("scaler", StandardScaler()),
        ("model", KNeighborsClassifier(n_neighbors=5)),
    ]
)

When KNN is a good baseline

small to medium datasets
when decision boundary is not too complex

Mini-checkpoint

Try k = 3, 5, 11 and compare validation performance.

🧪 Try It Yourself

Exercise 1 – Train-Test Split

Exercise 2 – Fit a Linear Model

Exercise 3 – Evaluate with MSE

If this helped you, consider buying me a coffee ☕

Buy me a coffee

K-Nearest Neighbors (KNN)

What KNN does

false

How to choose k

Scaling is critical

Scikit-learn example

When KNN is a good baseline

Mini-checkpoint

🧪 Try It Yourself

Exercise 1 – Train-Test Split

Exercise 2 – Fit a Linear Model

Exercise 3 – Evaluate with MSE

Was this page helpful?