Label Encoding
What is label encoding?
Label encoding converts categories into integers:
- high → 2
When it’s OK
-
For ordinal categories (where order matters):
- low < medium < high
-
For tree models sometimes (but still be careful)
When to avoid it
For nominal categories (no order):
- city, color, country
Label encoding may trick models into believing an order exists.
Example: ordinal encoding (manual)
Manual ordinal encoding
import pandas as pd
df = pd.DataFrame({"priority": ["low", "high", "medium", "low"]})
mapping = {"low": 0, "medium": 1, "high": 2}
df["priority_code"] = df["priority"].map(mapping)
print(df)Manual ordinal encoding
import pandas as pd
df = pd.DataFrame({"priority": ["low", "high", "medium", "low"]})
mapping = {"low": 0, "medium": 1, "high": 2}
df["priority_code"] = df["priority"].map(mapping)
print(df)Using scikit-learn LabelEncoder
LabelEncoderLabelEncoder is mainly designed for encoding target labels, not features.
LabelEncoder
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y = ["spam", "ham", "ham", "spam"]
print(le.fit_transform(y))
print(le.classes_)LabelEncoder
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y = ["spam", "ham", "ham", "spam"]
print(le.fit_transform(y))
print(le.classes_)Tip
Use:
- One-hot encoding for nominal categories
- Ordinal mapping for ordered categories
If this helped you, consider buying me a coffee ☕
Buy me a coffeeWas this page helpful?
Let us know how we did
