Skip to content

Binning and Discretization

Why bin values?

Binning (discretization) converts numeric values into categories:

- Age → {018, 1935, 3660, 60+}
- Spend → {low, medium, high}
- Age → {018, 1935, 3660, 60+}
- Spend → {low, medium, high}

Use cases:

  • Easier reporting and segmentation
  • Non-linear relationship capture (sometimes)
  • Feature engineering

pd.cutpd.cut (fixed bins)

cut
import pandas as pd
 
age = pd.Series([12, 19, 25, 38, 52, 67])
 
bins = [0, 18, 35, 60, 200]
labels = ["0-18", "19-35", "36-60", "60+"]
 
age_group = pd.cut(age, bins=bins, labels=labels, right=True, include_lowest=True)
print(age_group)
cut
import pandas as pd
 
age = pd.Series([12, 19, 25, 38, 52, 67])
 
bins = [0, 18, 35, 60, 200]
labels = ["0-18", "19-35", "36-60", "60+"]
 
age_group = pd.cut(age, bins=bins, labels=labels, right=True, include_lowest=True)
print(age_group)

pd.qcutpd.qcut (quantile bins)

Quantile binning tries to put roughly equal number of rows in each bin.

qcut
import pandas as pd
 
values = pd.Series([10, 15, 20, 25, 30, 40, 60, 100])
 
bucket = pd.qcut(values, q=3, labels=["low", "mid", "high"])
print(bucket)
qcut
import pandas as pd
 
values = pd.Series([10, 15, 20, 25, 30, 40, 60, 100])
 
bucket = pd.qcut(values, q=3, labels=["low", "mid", "high"])
print(bucket)

Tips

  • Choose bins based on domain meaning.
  • Quantile bins are useful when distributions are skewed.
  • Always check counts per bin after binning.

If this helped you, consider buying me a coffee ☕

Buy me a coffee

Was this page helpful?

Let us know how we did