3

Let's say I have an experiment which yields discrete results between 1 and $N$. I am modelling the results using a number of statistical models and want to use Akaike (corrected) or Bayesian Information Criterion to choose the best model. How can I derive AICc or BIC if the predicted variables are discrete and bounded? Is there a difference in what we mean by "sample size" in AICc or BIC formula?

Richard Hardy
  • 67,272
quant_dev
  • 654
  • 4
  • 13
  • 1
    I believe AICc is derived for linear models anyway, so you'd be guessing in that case anyway. I would probably go back to the original BIC derivation and see where the $\log(n)$ term emerges from/how it would be modified for discrete/low-information responses. – Ben Bolker Sep 20 '15 at 01:24
  • In Claeskens and Hjort, "Model Selection and Model Averaging", n is the number of "observed data". What is "observed data"? If I am conducting a presidential election Poll and ask 1000 people "Will you vote for Bush or for Gore", than is n=1000 (because I got 1000 binary answers) or is n=1 (because I've got one independent data point: the fraction of the interviewees who will for vote for Bush)? – quant_dev Sep 20 '15 at 12:18
  • 1
    n=1000. Check this thread and this thread for when it is OK to use AIC for model comparison. I mostly agree with the first thread but am suspicious of the second one. – Richard Hardy Oct 19 '15 at 18:11
  • I am shocked that there is so much ambiguity about this issues. Isn't statistics a part of mathematics? Shouldn't there be a clear proof when X works and when it doesn't? – quant_dev Oct 21 '15 at 09:01

1 Answers1

1

"I am shocked that there is so much ambiguity about this issues. Isn't statistics a part of mathematics? Shouldn't there be a clear proof when X works and when it doesn't?"

My answer: one of the most important things I ever learned was that while statistics does heavily involve math, it is not just math. It shares a lot with fields like law and politics. Some of the most important questions in statistics come down to value judgments: What's your goal? What's your data? What are you willing to assume? What can you get out of a model that is very likely to be much simpler than our incredibly complex reality.

But, yes, it would be nice to have a clear answer to the question, "When my data are discrete (in my case, binary or categorical), is the traditional AICc appropriate?"