Binning and Discretization

Why bin values?

Binning (discretization) converts numeric values into categories:

- Age → {0–18, 19–35, 36–60, 60+}
- Spend → {low, medium, high}

- Age → {0–18, 19–35, 36–60, 60+}
- Spend → {low, medium, high}

Use cases:

Easier reporting and segmentation
Non-linear relationship capture (sometimes)
Feature engineering

`pd.cutpd.cut` (fixed bins)

cut

import pandas as pd
 
age = pd.Series([12, 19, 25, 38, 52, 67])
 
bins = [0, 18, 35, 60, 200]
labels = ["0-18", "19-35", "36-60", "60+"]
 
age_group = pd.cut(age, bins=bins, labels=labels, right=True, include_lowest=True)
print(age_group)

cut

import pandas as pd
 
age = pd.Series([12, 19, 25, 38, 52, 67])
 
bins = [0, 18, 35, 60, 200]
labels = ["0-18", "19-35", "36-60", "60+"]
 
age_group = pd.cut(age, bins=bins, labels=labels, right=True, include_lowest=True)
print(age_group)

`pd.qcutpd.qcut` (quantile bins)

Quantile binning tries to put roughly equal number of rows in each bin.

qcut

import pandas as pd
 
values = pd.Series([10, 15, 20, 25, 30, 40, 60, 100])
 
bucket = pd.qcut(values, q=3, labels=["low", "mid", "high"])
print(bucket)

qcut

import pandas as pd
 
values = pd.Series([10, 15, 20, 25, 30, 40, 60, 100])
 
bucket = pd.qcut(values, q=3, labels=["low", "mid", "high"])
print(bucket)

Tips

Choose bins based on domain meaning.
Quantile bins are useful when distributions are skewed.
Always check counts per bin after binning.

If this helped you, consider buying me a coffee ☕

Buy me a coffee

Binning and Discretization

Why bin values?

pd.cutpd.cut (fixed bins)

pd.qcutpd.qcut (quantile bins)

Tips

Was this page helpful?

`pd.cutpd.cut` (fixed bins)

`pd.qcutpd.qcut` (quantile bins)