Skip to content

Naïve Bayes Classifier

The big idea

Naïve Bayes predicts the most likely class using Bayes’ theorem.

It assumes features are conditionally independent given the class.

That assumption is often false, but it still works surprisingly well.

Bayes theorem

P(class | features) ∝ P(features | class) * P(class)P(class | features) ∝ P(features | class) * P(class)

Text features (word counts) produce high-dimensional sparse vectors.

Naïve Bayes handles this efficiently.

Common variants

  • GaussianNB (continuous features)
  • MultinomialNB (counts, text)
  • BernoulliNB (binary features)

Scikit-learn example

Naïve Bayes
from sklearn.naive_bayes import GaussianNB
 
nb = GaussianNB()
Naïve Bayes
from sklearn.naive_bayes import GaussianNB
 
nb = GaussianNB()

Pros and cons

Pros:

  • very fast
  • strong baseline
  • works well for text

Cons:

  • independence assumption can limit accuracy

Mini-checkpoint

If you have TF-IDF features for spam detection, which NB variant is common?

(Usually MultinomialNB.)

If this helped you, consider buying me a coffee ☕

Buy me a coffee

Was this page helpful?

Let us know how we did