I've built a little naive Bayesian classifier that works with Boolean and real values. Boolean distributions are dealt with via Bernoulli distributions, while real valued data are dealt with kernel mixture estimators. I'm currently in the process of adding count data in.
How would one deal with categorical data though, e.g. Monday, Tuesday, Wednesday, or Toyota, Honda, Ford?
My initial thoughts are to assign a number to each category, treat it as a normal real value and round to the nearest integer category on prediction. That seems very wrong to me though.
"treat it as a normal real value and round to the nearest integer category on prediction", but it sounds like an incorrect approach. Categorical valued data is treated similar to boolean data. It is a discrete variable and should be treated as discrete, not real value. Just assign integers $1,...,K$ as labels to your $K$ categories, then use the counts to estimate parameters of your categorical distribution just as you used the counts to estimate the Bernoulli's. – jerad Dec 17 '12 at 21:21Jarad, thanks for the pointer I'll get working on this.
– Dec 17 '12 at 22:36