(I'm a newbie at stats. I'm a mathematician and a programmer and I'm trying to build something like a naive Bayesian spam filter.)
I've noticed in many places that people tend to break down the denominator in the equation from Bayes' Theorem. So instead of this:
$\frac{P(A|B)\cdot P(B)}{P(A)}$
We are presented with this:
$\frac{P(A|B)\cdot P(B)}{P(A|B)\cdot P(B)+P(A|\neg B)\cdot P(\neg B)}$
You can see that this convention is used in this Wikipedia article and in this insightful post by Tim Peters.
I am baffled by this. Why is the denominator broken down like this? How does that help things at all? What's so complicated about calculating $P(A)$, which in the case of spam filters would be The probability that the word "cheese" appears in an email, regardless of whether it's spam or not?