Anomaly Detection with Isolation Forests
What anomaly detection is
Anomalies (outliers) are points that are rare or unusual.
Examples:
- fraud transactions
- sensor failures
- sudden spikes in usage
Why Isolation Forest works
Isolation Forest isolates points using random splits.
- anomalies are easier to isolate
- normal points need more splits
false
flowchart LR A[Random split trees] --> I[Isolate points] I --> S[Short path length => anomaly] I --> L[Long path length => normal]
false
Scikit-learn example
IsolationForest
from sklearn.ensemble import IsolationForest
iso = IsolationForest(
n_estimators=200,
contamination=0.01,
random_state=42,
)
labels = iso.fit_predict(X)
# labels: 1 (normal), -1 (anomaly)IsolationForest
from sklearn.ensemble import IsolationForest
iso = IsolationForest(
n_estimators=200,
contamination=0.01,
random_state=42,
)
labels = iso.fit_predict(X)
# labels: 1 (normal), -1 (anomaly)Tips
contaminationcontaminationshould be your best guess of anomaly rate- scale features if they differ dramatically
Mini-checkpoint
If contamination is too high, what happens?
- you’ll label too many normal points as anomalies.
If this helped you, consider buying me a coffee ☕
Buy me a coffeeWas this page helpful?
Let us know how we did
