Binning and Discretization
Why bin values?
Binning (discretization) converts numeric values into categories:
- Age → {0–18, 19–35, 36–60, 60+}
- Spend → {low, medium, high}- Age → {0–18, 19–35, 36–60, 60+}
- Spend → {low, medium, high}Use cases:
- Easier reporting and segmentation
- Non-linear relationship capture (sometimes)
- Feature engineering
pd.cutpd.cut (fixed bins)
cut
import pandas as pd
age = pd.Series([12, 19, 25, 38, 52, 67])
bins = [0, 18, 35, 60, 200]
labels = ["0-18", "19-35", "36-60", "60+"]
age_group = pd.cut(age, bins=bins, labels=labels, right=True, include_lowest=True)
print(age_group)cut
import pandas as pd
age = pd.Series([12, 19, 25, 38, 52, 67])
bins = [0, 18, 35, 60, 200]
labels = ["0-18", "19-35", "36-60", "60+"]
age_group = pd.cut(age, bins=bins, labels=labels, right=True, include_lowest=True)
print(age_group)pd.qcutpd.qcut (quantile bins)
Quantile binning tries to put roughly equal number of rows in each bin.
qcut
import pandas as pd
values = pd.Series([10, 15, 20, 25, 30, 40, 60, 100])
bucket = pd.qcut(values, q=3, labels=["low", "mid", "high"])
print(bucket)qcut
import pandas as pd
values = pd.Series([10, 15, 20, 25, 30, 40, 60, 100])
bucket = pd.qcut(values, q=3, labels=["low", "mid", "high"])
print(bucket)Tips
- Choose bins based on domain meaning.
- Quantile bins are useful when distributions are skewed.
- Always check counts per bin after binning.
If this helped you, consider buying me a coffee ☕
Buy me a coffeeWas this page helpful?
Let us know how we did
