Feature Engineering - Creating New Insights

What feature engineering is

Feature engineering is the process of creating better input variables so the model can learn more easily.

Good features:

summarize raw information
reduce noise
expose relationships
align with how the real world works

Common feature engineering patterns

1) Date/time features

From a timestamp you can create:

hour of day
day of week
weekend flag
month

Example: user activity often depends on time.

2) Aggregations

average order value per user
number of purchases in last 30 days

3) Ratios and differences

profit = revenue - cost
BMI = weight / height²

4) Binning

Convert a numeric variable into buckets:

age buckets: 0–18, 19–30, 31–50, 50+

5) Text features

length of message
count of exclamation marks
TF-IDF vectors (later phase)

The golden rule: do it without leakage

Feature must be computable at prediction time.

Bad feature example:

“number of refunds after purchase” while predicting fraud at purchase time

Feature engineering in scikit-learn pipelines

You can use FunctionTransformerFunctionTransformer for simple transformations.

Simple custom feature function

import numpy as np
from sklearn.preprocessing import FunctionTransformer
 
def add_log1p(X):
    # Works on numpy arrays; for pandas, you can select columns before
    return np.log1p(X)
 
log_transformer = FunctionTransformer(add_log1p)