1

Please check the image for reference How to choose the prior probability of each feature .

Should we use the same prior for every feature or different features use different prior.

for example P(Color = Blue | Stolen = Yes ) = (0 + 1(1/3)) / 5 + 1 , where I used the smoothing parameter (alpha) = 1 and p = 1/3 as color can have 3 values i.e Red Blue and Yellow. So I am a little confused here , should I consider Blue also here or should I just consider Red and Yellow

Also for the other features should I use the same p = 1/3 or for them it should be p=1/2 as they only have two values .

Kindly check the image enter image description here

1 Answers1

1

The more standard definition of Laplace smoothing is defined as adding a fixed quantity $\alpha$ to each count

$$ \hat p_i = \frac{n_i+\alpha}{\sum_{j=1}^K n_j+\alpha} = \frac{n_i+\alpha}{(\sum_{j=1}^K n_j) + K\alpha} $$

where $\alpha$'s can be thought as "pseudocounts", i.e. with $\alpha=1$ you assume that for each of the categories you observed $+1$ sample like this, with higher values leading to more smoothing and lower values to less smoothing.

Splitting $\alpha$ to the two parameters $m$ and $P$ so $mP=\alpha$, to me seem unnecessarily complicated. Still, if you remember that $mP$ is a "pseudocount" added to each observed count, it may be easier to grasp an intuition on what the reasonable values could be.

Tim
  • 138,066
  • so K here is the no of values in that feature ? Assume training set contains 10 positive examples, and the feature “Size” has 3 values, but 1 value (size=medium) is not observed in : 4: small 0: medium 6: large

    so K here will be 3 ?

    P(small | positive) = (4 + 1) / (10 + 3) = 0.394 P(medium | positive) = (0 + 1) / (10 + 3) = 0.03

    – Skywalker Jun 16 '21 at 07:58
  • @Skywalker correct, $K$ is the number of categories, so as in your example, if there are three possible sizes, $K=3$. – Tim Jun 16 '21 at 08:18