DBSCAN - Density-Based Clustering
Why DBSCAN is useful
K-means assumes spherical clusters and forces every point into a cluster.
DBSCAN can:
- find clusters of any shape
- mark outliers as noise
Key parameters
epseps: neighborhood radiusmin_samplesmin_samples: minimum points to form a dense region
Intuition
A point is:
- core point if it has enough neighbors within eps
- border if itβs near a core point but not dense enough itself
- noise if itβs not reachable from any core cluster
false
flowchart LR
P[Point] --> Q{Enough neighbors in eps?}
Q -->|yes| C[Core]
Q -->|no| R{Reachable from core?}
R -->|yes| B[Border]
R -->|no| N[Noise]
false
Scikit-learn example
DBSCAN
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
db = Pipeline(
steps=[
("scaler", StandardScaler()),
("model", DBSCAN(eps=0.5, min_samples=5)),
]
)DBSCAN
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
db = Pipeline(
steps=[
("scaler", StandardScaler()),
("model", DBSCAN(eps=0.5, min_samples=5)),
]
)Pros and cons
Pros:
- detects outliers naturally
- good for non-spherical clusters
Cons:
- choosing eps can be tricky
- struggles when clusters have different densities
Mini-checkpoint
If you increase eps:
- do you get more or fewer clusters?
(Usually fewer; clusters merge.)
If this helped you, consider buying me a coffee β
Buy me a coffeeWas this page helpful?
Let us know how we did
