Naive Bayes Disadvantages
Naive Bayes is an impressive algorithm based on Bayes Theorem and it is widely used to create practical machine learning solutions. However, it has a few characteristics that make it less useful or less accurate if not paid attention to.
Relies on and often an incorrect assumption of independent features. In real life, you will hardly find independent features. For example, Loan eligibility analysis would depend on the applicant’s income, age, previous loan, location, and transaction history which might be interdependent.
Not ideal for data sets with a large number of numerical attributes. If the number of attributes is larger then there will be high computation cost and it will suffer from the Curse of dimensionality.
If a category is not captured in the training set and appears in the test data set then the model is assign 0 (zero) probability which leads to incorrect calculation. This phenomenon is referred to as ‘Zero frequency’ and to overcome ‘Zero frequency’ phenomena you will have to use smoothing techniques.
Naive Bayes algorithm is called “naive” because it assumes that all the features are independent.
This is obviously not the case more often than not in real world.
Although small correlations will be tolerated fine if independence doesn’t exist above a certain treshold Naive Bayes can lose its stability and produce pretty inaccurate results.
Therefore naivete of Naive Bayes algorithm should always be remembered and independence of features in dataset should be ensured.
2- No Regression
Naive Bayes can only handle classification problems.
This can be a turn off for some analysts or data scientists since many other machine learning algorithms such as Random Forest, Support Vector Machine, Decision Trees and even kNN has both classification and regression implementation variants.
Another example to classification only machine learning algorithms is Logistic Regression.
3- Naive Bayes Variants
When you’ve decided to implement Naive Bayes machine learning algorithm you will also need to make sure you select the right Naive Bayes variant.
For example, GaussianNB is commonly used with continuous values and BernoulliNB is used with binary features while MultinomialNB is used with categorical discrete features. Although actually an advantage this selection requires an extra step of important consideration to the project. You can see a detailed explanation of different Naive Bayes models in this tutorial:
4- Inefficient Training Data
Another drawback of Naive Bayes occurs when there isn’t ample data for training or if data doesn’t include all possible features. For example, if a feature only occurs once, Naive Bayes may predict the same outcome as 1 (100% probability) or it can predict a previously unobserved sample as 0 (0% probability). These are both absurd predictions that shouldn’t occur in real life scenarios.
Luckily there exists Additive Smoothing or Laplace Smoothing. Laplace was first to realize this illogical shortcoming of Bayes Theorem and he addressed it with a smoothing method. This intervention to the Naive Bayes model can be handled via var_smoothing hyperparameter in Scikit-Learn. You can see a tutorial about Additive Smoothing for Naive Bayes algorithm in the tutorial below:
Like all machine learning algorithms Naive Bayes also has a few shortcomings. But really, most potentially troubling disadvantage of Naive Bayes models is the assumption of independence between features and decreased accuracy when predictors (features) are dependent.
Naive Bayes also has many significant advantages and you can see those in the tutorial below:
When features have dependency, feature engineering or feature extraction can be used to address the issue by engineering the features in the dataset. If all fails another algorithm like Random Forest or Logistic Regression could also be a good alternative. You can see our tutorials regarding alternative ML algorithms below: