Skip to content

Feature Engineering Basics

What is feature engineering?

Feature engineering is the process of converting raw data into useful inputs (features).

Examples:

  • Date → day of week
  • Amount → log(amount)
  • Text → length, presence of keywords
  • Customer transactions → total spend, avg order value

Example: derive features from datetime

Datetime features
import pandas as pd
 
df = pd.DataFrame({
    "order_time": pd.to_datetime([
        "2025-01-01 10:15:00",
        "2025-01-02 18:30:00",
        "2025-01-03 09:10:00",
    ]),
    "amount": [250, 180, 90],
})
 
df["hour"] = df["order_time"].dt.hour
df["weekday"] = df["order_time"].dt.day_name()
df["is_weekend"] = df["order_time"].dt.weekday >= 5
 
print(df)
Datetime features
import pandas as pd
 
df = pd.DataFrame({
    "order_time": pd.to_datetime([
        "2025-01-01 10:15:00",
        "2025-01-02 18:30:00",
        "2025-01-03 09:10:00",
    ]),
    "amount": [250, 180, 90],
})
 
df["hour"] = df["order_time"].dt.hour
df["weekday"] = df["order_time"].dt.day_name()
df["is_weekend"] = df["order_time"].dt.weekday >= 5
 
print(df)

Example: ratios and flags

Ratios
import pandas as pd
 
df = pd.DataFrame({"revenue": [1000, 500], "users": [100, 20]})
 
df["revenue_per_user"] = df["revenue"] / df["users"]
df["is_high_value"] = df["revenue_per_user"] > 20
 
print(df)
Ratios
import pandas as pd
 
df = pd.DataFrame({"revenue": [1000, 500], "users": [100, 20]})
 
df["revenue_per_user"] = df["revenue"] / df["users"]
df["is_high_value"] = df["revenue_per_user"] > 20
 
print(df)

Example: aggregated features (groupby)

Aggregations
import pandas as pd
 
orders = pd.DataFrame({
    "customer_id": [1, 1, 2, 2, 2],
    "amount": [100, 250, 90, 180, 300],
})
 
cust = orders.groupby("customer_id").agg(
    total_spend=("amount", "sum"),
    avg_order=("amount", "mean"),
    orders=("amount", "count"),
).reset_index()
 
print(cust)
Aggregations
import pandas as pd
 
orders = pd.DataFrame({
    "customer_id": [1, 1, 2, 2, 2],
    "amount": [100, 250, 90, 180, 300],
})
 
cust = orders.groupby("customer_id").agg(
    total_spend=("amount", "sum"),
    avg_order=("amount", "mean"),
    orders=("amount", "count"),
).reset_index()
 
print(cust)

Guiding principles

  • Don’t leak target/future information.
  • Prefer simple features first.
  • Validate with charts and summary stats.
  • Document each feature and its meaning.

If this helped you, consider buying me a coffee ☕

Buy me a coffee

Was this page helpful?

Let us know how we did