Introduction to Clustering

What clustering does

Clustering groups points such that:

false

  flowchart LR
  X[Data points] --> C[Clustering algorithm] --> G[Cluster labels (groups)]

Most clustering methods rely on a notion of similarity, often distance.

Common distances:

If your features are on different scales, distance-based clustering can fail.

Use scaling (StandardScaler/MinMaxScaler) when appropriate.

There is usually no “ground truth”.

You validate using:

If you’re clustering customers:

If this helped you, consider buying me a coffee ☕