Questions tagged [naive-bayes]

A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with strong independence assumptions. A more descriptive term for the underlying probability model would be "independent feature model".

Questions about using, optimizing, or interpreting a naive Bayes classifier should use this tag.

Wikipedia's summary:

A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with strong independence assumptions. A more descriptive term for the underlying probability model would be "independent feature model".

Wikipedia's introduction:

In simple terms, a naive Bayes classifier assumes that the presence or absence of a particular feature is unrelated to the presence or absence of any other feature, given the class variable. For example, a fruit may be considered to be an apple if it is red, round, and about 3" in diameter. A naive Bayes classifier considers each of these features to contribute independently to the probability that this fruit is an apple, regardless of the presence or absence of the other features.

For some types of probability models, naive Bayes classifiers can be trained very efficiently in a supervised learning setting. In many practical applications, parameter estimation for naive Bayes models uses the method of maximum likelihood; in other words, one can work with the naive Bayes model without accepting Bayesian probability or using any Bayesian methods.

Despite their naive design and apparently oversimplified assumptions, naive Bayes classifiers have worked quite well in many complex real-world situations. In 2004, an analysis of the Bayesian classification problem showed that there are sound theoretical reasons for the apparently implausible efficacy of naive Bayes classifiers [(Zhang, 2004)]. Still, a comprehensive comparison with other classification algorithms in 2006 showed that Bayes classification is outperformed by other approaches, such as boosted trees or random forests [(Caruana & Niculescu–Mizil, 2006)].

An advantage of naive Bayes is that it only requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification. Because independent variables are assumed, only the variances of the variables for each class need to be determined and not the entire covariance matrix.

We can visualize a naive Bayes graphically as follows:

generic naive Bayes model

In this Bayesian network, predictive attributes $X_i$ are conditionally independent given the class $C$.

References:

  • Caruana, R., & Niculescu–Mizil, A. (2006). An empirical comparison of supervised learning algorithms. Proceedings of the 23rd International Conference on Machine Learning, 161–168. Available online, URL: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.122.5901.

  • Domingos, P. & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29, 103–137.

  • Metsis, V., Androutsopoulos, I., & Paliouras, G. (2006). Spam filtering with naive Bayes—which naive Bayes? Third Conference on Email and Anti-Spam (CEAS), 17.

  • Rennie, J., Shih, L., Teevan, J., & Karger, D. (2003). Tackling the poor assumptions of naive Bayes classifiers. Proceedings of the Twentieth International Conference on Machine Learning. Available online, URL: http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf.

  • Rish, I. (2001). An empirical study of the naive Bayes classifier. IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence. Available online, URL: http://www.research.ibm.com/people/r/rish/papers/RC22230.pdf.

  • Zhang, H. (2004). The optimality of naive Bayes. FLAIRS2004 conference. American Association for Artificial Intelligence. Available online, URL: http://www.cs.unb.ca/profs/hzhang/publications/FLAIRS04ZhangH.pdf.

599 questions
12
votes
1 answer

How to use Naive Bayes for multi class problems?

I know how Naive Bayes work for classifying binary problems. I just need to know what are the standard way to apply NB on multi-class classification problems. Any idea please?
11
votes
3 answers

Naive Bayes: Continuous and Categorical Predictors

It's my understanding that most types of common classifiers (Support Vector Machine, for example) can take a mixture of categorical and continuous predictors. However, this doesn't seem to be true for Naive Bayes, since I need to specify the…
11
votes
3 answers

The difference between the Bayes Classifier and The Naive Bayes Classifier?

I'm trying to find the connection between both classifiers. In NBC we assume that all the features are independent of each other so we can calculate the posterior probability easier. I assume Bayes Classifier is more complex but how is the process…
Edqu3
  • 135
9
votes
1 answer

Naive Bayes classifier gives a probability greater than 1

I'm trying to understand an example regarding how to use a Naive Bayes classifier in spam filtering based on this link. The article picks two "bad" words that they figure are in spam a lot and then calculate the probability that a message is spam…
mj_
  • 243
5
votes
2 answers

Proving that Gaussian Naive Bayes Decision Boundary is Linear

I need to come up with a Proof that Gaussian Naive Bayes has a linear decision boundary (In this case for Y={0,1}) I tried to work it out, but I am not able to pull out the xi term as it is stuck in the squared term
raaj
  • 233
5
votes
1 answer

The purpose of threshold in naive bayes algorithm

Suppose we have 2 classes. By using naive Bayes classifier we get posterior probabilities P(C1|D) and P(C2|D) for query sample.....suppose P(C1|D) is higher than P(C2|D), then we assign the tag of class 1 to query sample. My question is this : Where…
5
votes
1 answer

How to deal with mixture of continuous and discrete features when using Naive Bayes classifier

My task is to use Naive Bayes classifier for prediction, where I have both continuous and discrete variables as predictor variables. In literature the classifier is written as: $$\hat{y}= \underset{k\;\in\;\{1,..K\}} {\mathrm{argmax}}…
jjepsuomi
  • 5,807
4
votes
1 answer

Naive question about Naive Bayes modeling

In Naive Bayes classifiers, one calculates a frequency table to determine a prediction. A classic example, one calculates the frequency table of words given the context of spam or ham. E.g. P( viagra | spam ) Which is the probability that given a…
Chris
  • 1,251
  • 10
  • 31
3
votes
2 answers

How would you deal with categorical data in a naive Bayesian classifier?

I've built a little naive Bayesian classifier that works with Boolean and real values. Boolean distributions are dealt with via Bernoulli distributions, while real valued data are dealt with kernel mixture estimators. I'm currently in the process of…
user17880
2
votes
0 answers

Naive Bayes Classifier - measure accuracy after training

I have built a prototype Naïve Bayes Classifier in an Excel spreadsheet. My data is a transaction (an order) with 13 parameters. This translates directly to a feature vector of (feature_1, feature_2, …, feature_13). Each parameter of an order, if it…
Yawar
  • 121
2
votes
0 answers

Naïve Bayes with different distributions for each feature

I am looking at how naive Bayes works and I see that it goes over all the classes and finds the probability that maximizes: $\log(\operatorname{Pr}[Y=y]) + \sum_{i=1}^d \log(\operatorname{Pr}[X_i=x_i|Y=y])$ So it looks like the probability for each…
Lee
  • 151
2
votes
2 answers

Naive bayes with duplicated data

In the training set for naive bayes, there are some duplicate samples. Should we train the naive bayes with duplicate samples, or should we eliminate all the duplicates and then train the naive bayes. I have points for both for and against…
Saradhi
  • 121
1
vote
1 answer

How can I convert variable distribution parameters into training data for Naive Bayes classifier?

I am trying build a Naive Bayes classifier from data pulled from scientific papers. I want to use the reported variable distribution parameters to approximate a dataset which I can use to train the Naive Bayes classifier (since usually there is no…
1
vote
1 answer

Can a Naive Bayes classificator "learn" variables which are not in the training set?

Theory I have three data sets, let's call them A, B and C. These data sets contain the following variables: A = { x, A1, A2, ..., An } B = { B1, B2, ..., Bm } C = { A1, A2, B1, B2 } x is the dependent variable which I'd like to predict. As you can…
1
vote
1 answer

In Naïve Bayes, why do we estimate Pr(W|H)*Pr(H) instead of Pr(W)

This wikipedia article describes spam filtering using Naïve Bayes: https://en.wikipedia.org/wiki/Naive_Bayes_spam_filtering It says P(S|W) is given as Pr(W|S)*Pr(S) / (Pr(W|S)*Pr(S)) + Pr(W|H)*Pr(H)). However, one could also get P(S|W) by estimating…
1
2 3