How to choose prior in Laplace Smoothing Naive Bayes?

Question

Please check the image for reference How to choose the prior probability of each feature .

Should we use the same prior for every feature or different features use different prior.

for example P(Color = Blue | Stolen = Yes ) = (0 + 1(1/3)) / 5 + 1 , where I used the smoothing parameter (alpha) = 1 and p = 1/3 as color can have 3 values i.e Red Blue and Yellow. So I am a little confused here , should I consider Blue also here or should I just consider Red and Yellow

Also for the other features should I use the same p = 1/3 or for them it should be p=1/2 as they only have two values .

Kindly check the image

Could you give us more details on what exactly were you asked to do and what exactly is the Laplace smoothing formula you are referring to? The usual definition of Laplace smoothing has only one parameter. — Tim, May 20 '21 at 07:30
Given the Attributes that are< Color = Blue , Type = Sports , Origin= Domestic >, predict using Naive Bayes that whether the car can be stolen or not.
The formula for Laplace smoothing is almost the same as you provide, however in the numerator, a prior P is multiplied to the alpha. Here is the image for further reference — Skywalker, May 20 '21 at 15:31

Tim · Accepted Answer · 2021-05-20T17:07:44.383

1

The more standard definition of Laplace smoothing is defined as adding a fixed quantity $\alpha$ to each count

$$ \hat p_i = \frac{n_i+\alpha}{\sum_{j=1}^K n_j+\alpha} = \frac{n_i+\alpha}{(\sum_{j=1}^K n_j) + K\alpha} $$

where $\alpha$'s can be thought as "pseudocounts", i.e. with $\alpha=1$ you assume that for each of the categories you observed $+1$ sample like this, with higher values leading to more smoothing and lower values to less smoothing.

Splitting $\alpha$ to the two parameters $m$ and $P$ so $mP=\alpha$, to me seem unnecessarily complicated. Still, if you remember that $mP$ is a "pseudocount" added to each observed count, it may be easier to grasp an intuition on what the reasonable values could be.

edited May 20 '21 at 17:07

answered May 20 '21 at 16:59

Tim

138,066

so K here is the no of values in that feature ? Assume training set contains 10 positive examples, and the feature “Size” has 3 values, but 1 value (size=medium) is not observed in : 4: small 0: medium 6: large
so K here will be 3 ?

P(small | positive) = (4 + 1) / (10 + 3) = 0.394 P(medium | positive) = (0 + 1) / (10 + 3) = 0.03
– Skywalker Jun 16 '21 at 07:58
@Skywalker correct, $K$ is the number of categories, so as in your example, if there are three possible sizes, $K=3$. – Tim Jun 16 '21 at 08:18

How to choose prior in Laplace Smoothing Naive Bayes?

1 Answers1

Linked