Feature Engineering - Creating New Insights
What feature engineering is
Feature engineering is the process of creating better input variables so the model can learn more easily.
Good features:
- summarize raw information
- reduce noise
- expose relationships
- align with how the real world works
Common feature engineering patterns
1) Date/time features
From a timestamp you can create:
- hour of day
- day of week
- weekend flag
- month
Example: user activity often depends on time.
2) Aggregations
- average order value per user
- number of purchases in last 30 days
3) Ratios and differences
- profit = revenue - cost
- BMI = weight / height²
4) Binning
Convert a numeric variable into buckets:
- age buckets: 0–18, 19–30, 31–50, 50+
5) Text features
- length of message
- count of exclamation marks
- TF-IDF vectors (later phase)
The golden rule: do it without leakage
Feature must be computable at prediction time.
Bad feature example:
- “number of refunds after purchase” while predicting fraud at purchase time
Feature engineering in scikit-learn pipelines
You can use FunctionTransformerFunctionTransformer for simple transformations.
Simple custom feature function
import numpy as np
from sklearn.preprocessing import FunctionTransformer
def add_log1p(X):
# Works on numpy arrays; for pandas, you can select columns before
return np.log1p(X)
log_transformer = FunctionTransformer(add_log1p)Simple custom feature function
import numpy as np
from sklearn.preprocessing import FunctionTransformer
def add_log1p(X):
# Works on numpy arrays; for pandas, you can select columns before
return np.log1p(X)
log_transformer = FunctionTransformer(add_log1p)Mini-checkpoint
Pick a dataset you know and list:
- 3 raw features
- 3 engineered features you could derive
- confirm each engineered feature is available at prediction time
If this helped you, consider buying me a coffee ☕
Buy me a coffeeWas this page helpful?
Let us know how we did
