The ML Lifecycle - From Data to Deployment
Why βlifecycleβ matters
ML isnβt just training a model.
A model that performs well in a notebook can fail in production because:
- the data distribution changes (drift)
- data quality issues appear
- latency constraints exist
- labels arrive late or are noisy
The lifecycle stages
false
flowchart TD A[1. Problem Definition] --> B[2. Data Collection] B --> C[3. Data Cleaning & Preprocessing] C --> D[4. Feature Engineering] D --> E[5. Train Model] E --> F[6. Evaluate & Validate] F --> G[7. Deploy] G --> H[8. Monitor & Improve] H --> C
false
1) Problem definition
Decide:
- what is the target?
- what does success mean (metric + threshold)?
- what constraints exist (latency, cost, explainability)?
2) Data collection
Good data beats fancy models.
Typical sources:
- databases (SQL)
- CSV exports
- logs
- APIs
3) Cleaning & preprocessing
Examples:
- missing values
- outliers
- inconsistent categories
- duplicates
4) Feature engineering
Transform raw data into useful signals.
Example: from timestamps create:
- day-of-week
- hour-of-day
5) Training
Fit parameters of your chosen algorithm on training data.
6) Evaluation & validation
Use:
- validation sets and cross-validation
- metrics aligned with the business goal
Watch out for:
- data leakage
- overfitting
7) Deployment
Common forms:
- batch predictions (daily scoring job)
- real-time API
- embedded model (mobile/edge)
8) Monitoring
Monitor:
- input drift (feature distribution changes)
- prediction drift
- performance decay
Key takeaway
ML work is iterative.
Youβll move back and forth between data, features, and training until the model is good enoughβand then keep iterating after deployment.
π§ͺ Try It Yourself
Exercise 1 β Train-Test Split
Exercise 2 β Fit a Linear Model
Exercise 3 β Evaluate with MSE
If this helped you, consider buying me a coffee β
Buy me a coffeeWas this page helpful?
Let us know how we did
