Handling Outliers

Outliers are not always errors

Outliers could be:

So the first step is always: investigate.

Remove via IQR bounds

clean = df[(df["amount"] >= lower) & (df["amount"] <= upper)].copy()
print(clean)

Remove via IQR bounds

clean = df[(df["amount"] >= lower) & (df["amount"] <= upper)].copy()
print(clean)

Removing is risky when you report business totals.

Capping keeps all rows but limits extreme values.

Cap values

df["amount_capped"] = df["amount"].clip(lower, upper)

Cap values

df["amount_capped"] = df["amount"].clip(lower, upper)

Useful when values span many orders of magnitude.

Log transform

import numpy as np
 
df["amount_log"] = np.log1p(df["amount"])  # log(1+x) to handle 0

Log transform

import numpy as np
 
df["amount_log"] = np.log1p(df["amount"])  # log(1+x) to handle 0

If this helped you, consider buying me a coffee ☕