Saving and Loading Models (Pickle, Joblib)
Why we save models
Training can be expensive.
Instead of training every time, we:
- train once
- save model artifact
- load it for predictions
joblib vs pickle
picklepickle: general Python serializationjoblibjoblib: often better for large numpy arrays (common in sklearn models)
For scikit-learn, joblibjoblib is a common default.
Best practice: save the pipeline
Donβt save only the model.
Save the entire pipeline (preprocessing + model), so inference matches training.
Save a sklearn pipeline
import joblib
# trained pipeline
joblib.dump(model, "model.joblib")Save a sklearn pipeline
import joblib
# trained pipeline
joblib.dump(model, "model.joblib")Load and predict
import joblib
model = joblib.load("model.joblib")
pred = model.predict([{"age": 35, "income": 50000, "city": "Pune", "plan": "Pro"}])
print(pred)Load and predict
import joblib
model = joblib.load("model.joblib")
pred = model.predict([{"age": 35, "income": 50000, "city": "Pune", "plan": "Pro"}])
print(pred)Common pitfalls
- version mismatches (sklearn/numpy versions)
- saving raw model but forgetting preprocessing
- loading untrusted pickle files
Security note
Never unpickle files from untrusted sources.
Pickle can execute arbitrary code.
If this helped you, consider buying me a coffee β
Buy me a coffeeWas this page helpful?
Let us know how we did
