Naive Bayes is a type of supervised learning. It is considered as such since they are trained using labeled data (i.e., data that has been pre-classified into the available classification classes).
This is in contrast to unsupervised learning when no labeled data are provided. Supervised learning aims to discover natural structures that may exist in a data collection without having any prior knowledge of how to categorize the data.
A basic naive Bayes classifier is trained by acquiring knowledge of the probability underlying the classification tasks it performs. It is possible to estimate these probabilities from frequency counts of the data using pre-labeled data, provided that data has already been classified into appropriate groups (i.e., based on their pre-labeled classes).
Complex Bayes classification methods assume generative data distributions. They may be taught the assumed distributions’ parameters rather than just the frequencies themselves as part of their training.
Classifiers based on the naive Bayes algorithm are often used for text categorization applications. A common application of this is sorting emails into spam and ham.
Naive Bayes classifiers presume that the features (or input variables) of the data are conditionally independent. This makes calculating more simple. For straightforward text categorization, this means assuming that each word in the text documents under consideration is unrelated to the other words in the class.
In practice, the assumption of conditional independence seldom holds. However, in reality, multinomial naïve Bayes linear classifiers perform remarkably well.
- It was first developed for text categorization problems and remains a benchmark today.
Over the years, several improvements, including Support Vector Machines and KNN, have been developed to solve classification problems with more flexibility and intelligence. With sufficient pre-processed data, the naive Bayes classifier may still be competent and has demonstrated outstanding results where categorization is critical in diagnosis.