K-Means Clustering Algorithm

K-means in one sentence

K-means partitions data into K clusters by minimizing within-cluster variance.

The algorithm (intuition)

choose K
initialize K centroids
assign each point to nearest centroid
recompute centroids as the mean of assigned points
repeat until convergence

false

  flowchart TD
  A[Choose K] --> B[Initialize centroids]
  B --> C[Assign points to nearest centroid]
  C --> D[Update centroid = mean of cluster]
  D --> E{Converged?}
  E -->|no| C
  E -->|yes| F[Final clusters]

false

What K-means assumes

K-means works best when clusters are:

roughly spherical (ball-shaped)
similar size
separable by distance

It struggles when clusters are:

non-spherical
different densities
contain lots of outliers

Scikit-learn example

KMeans

from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
 
kmeans = Pipeline(
    steps=[
        ("scaler", StandardScaler()),
        ("model", KMeans(n_clusters=3, n_init=10, random_state=42)),
    ]
)

KMeans

from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
 
kmeans = Pipeline(
    steps=[
        ("scaler", StandardScaler()),
        ("model", KMeans(n_clusters=3, n_init=10, random_state=42)),
    ]
)

Mini-checkpoint

K-means always assigns every point to a cluster.

If you have outliers, what happens?

(Outliers still get assigned and can distort centroids.)

🧪 Try It Yourself

Exercise 1 – Train-Test Split

Exercise 2 – Fit a Linear Model

Exercise 3 – Evaluate with MSE

If this helped you, consider buying me a coffee ☕

Buy me a coffee

K-Means Clustering Algorithm

K-means in one sentence

The algorithm (intuition)

false

What K-means assumes

Scikit-learn example

Mini-checkpoint

🧪 Try It Yourself

Exercise 1 – Train-Test Split

Exercise 2 – Fit a Linear Model

Exercise 3 – Evaluate with MSE

Was this page helpful?