Skip to content

Saving and Loading Models (Pickle, Joblib)

Why we save models

Training can be expensive.

Instead of training every time, we:

  • train once
  • save model artifact
  • load it for predictions

joblib vs pickle

  • picklepickle: general Python serialization
  • joblibjoblib: often better for large numpy arrays (common in sklearn models)

For scikit-learn, joblibjoblib is a common default.

Best practice: save the pipeline

Don’t save only the model.

Save the entire pipeline (preprocessing + model), so inference matches training.

Save a sklearn pipeline
import joblib
 
# trained pipeline
joblib.dump(model, "model.joblib")
Save a sklearn pipeline
import joblib
 
# trained pipeline
joblib.dump(model, "model.joblib")
Load and predict
import joblib
 
model = joblib.load("model.joblib")
pred = model.predict([{"age": 35, "income": 50000, "city": "Pune", "plan": "Pro"}])
print(pred)
Load and predict
import joblib
 
model = joblib.load("model.joblib")
pred = model.predict([{"age": 35, "income": 50000, "city": "Pune", "plan": "Pro"}])
print(pred)

Common pitfalls

  • version mismatches (sklearn/numpy versions)
  • saving raw model but forgetting preprocessing
  • loading untrusted pickle files

Security note

Never unpickle files from untrusted sources.

Pickle can execute arbitrary code.

If this helped you, consider buying me a coffee β˜•

Buy me a coffee

Was this page helpful?

Let us know how we did