Skip to content

DBSCAN - Density-Based Clustering

Why DBSCAN is useful

K-means assumes spherical clusters and forces every point into a cluster.

DBSCAN can:

  • find clusters of any shape
  • mark outliers as noise

Key parameters

  • epseps: neighborhood radius
  • min_samplesmin_samples: minimum points to form a dense region

Intuition

A point is:

  • core point if it has enough neighbors within eps
  • border if it’s near a core point but not dense enough itself
  • noise if it’s not reachable from any core cluster

false


  flowchart LR
  P[Point] --> Q{Enough neighbors in eps?}
  Q -->|yes| C[Core]
  Q -->|no| R{Reachable from core?}
  R -->|yes| B[Border]
  R -->|no| N[Noise]

false

Scikit-learn example

DBSCAN
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
 
db = Pipeline(
    steps=[
        ("scaler", StandardScaler()),
        ("model", DBSCAN(eps=0.5, min_samples=5)),
    ]
)
DBSCAN
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
 
db = Pipeline(
    steps=[
        ("scaler", StandardScaler()),
        ("model", DBSCAN(eps=0.5, min_samples=5)),
    ]
)

Pros and cons

Pros:

  • detects outliers naturally
  • good for non-spherical clusters

Cons:

  • choosing eps can be tricky
  • struggles when clusters have different densities

Mini-checkpoint

If you increase eps:

  • do you get more or fewer clusters?

(Usually fewer; clusters merge.)

If this helped you, consider buying me a coffee β˜•

Buy me a coffee

Was this page helpful?

Let us know how we did