Skip to content

Label Encoding

What is label encoding?

Label encoding converts categories into integers:

  • high → 2

When it’s OK

  • For ordinal categories (where order matters):

    • low < medium < high
  • For tree models sometimes (but still be careful)

When to avoid it

For nominal categories (no order):

  • city, color, country

Label encoding may trick models into believing an order exists.

Example: ordinal encoding (manual)

Manual ordinal encoding
import pandas as pd
 
df = pd.DataFrame({"priority": ["low", "high", "medium", "low"]})
 
mapping = {"low": 0, "medium": 1, "high": 2}
df["priority_code"] = df["priority"].map(mapping)
 
print(df)
Manual ordinal encoding
import pandas as pd
 
df = pd.DataFrame({"priority": ["low", "high", "medium", "low"]})
 
mapping = {"low": 0, "medium": 1, "high": 2}
df["priority_code"] = df["priority"].map(mapping)
 
print(df)

Using scikit-learn LabelEncoder

LabelEncoderLabelEncoder is mainly designed for encoding target labels, not features.

LabelEncoder
from sklearn.preprocessing import LabelEncoder
 
le = LabelEncoder()
y = ["spam", "ham", "ham", "spam"]
print(le.fit_transform(y))
print(le.classes_)
LabelEncoder
from sklearn.preprocessing import LabelEncoder
 
le = LabelEncoder()
y = ["spam", "ham", "ham", "spam"]
print(le.fit_transform(y))
print(le.classes_)

Tip

Use:

  • One-hot encoding for nominal categories
  • Ordinal mapping for ordered categories

If this helped you, consider buying me a coffee ☕

Buy me a coffee

Was this page helpful?

Let us know how we did