3

I've found a variation of the $\chi^2$ statistic that looks like this:

$\chi^2 = \sum\limits_{i=1}^N\,\chi_i^2 = \sum\limits_{i=1}^N\,\frac{(\log m_{i}- \log n_{i})^2}{\sigma_{i}^2}$

where $\sigma_{i}^2=1/n_i$, $m_i$ is the modeled number of counts in bin $i$ and $n_i$ is the observed number (both are of course positive and integers) $log$ is the decimal logarithm.

If $m_i=0$ or $n_i=0$, the author assigns a very small value to avoid inconsistencies evaluating the $log$ function at $0$.

I understand that this parameter is biased in the sense that it will give more weight to $\chi_i$ factors where $n_i\neq0$; ie: it will pretty much disregard $\chi_i$ factors where $n_i=0$ (which will be replaced by a very small number, say $0.0001$) thus allowing the model to assign almost any number $m_i$ to that bin.

See here for an example of $\chi_i$ when $n_i=0$ (replaced by $0.0001$) and here to see the behavior of $\chi_i$ when $n_i=1$.

Clearly for what I see in the case where $n_i=1$ the values of $\chi_i$ are much bigger than those obtained by the same parameter when $n_i=0$ (replaced by $0.0001$)

I see this as a prove that the statistic is biased. Am I correct here?

EdM
  • 92,183
  • 10
  • 92
  • 267
Gabriel
  • 4,282
  • I think there is a confusion of concepts about "bias" here. Bias is a quality of a statistic, which is an estimate of a parameter. You can only talk about a statistic being biased if it is an attempt to estimate a parameter and can be shown to on average get it wrong. I don't see what parameter your statistic is trying to estimate. – Peter Ellis Jun 10 '12 at 03:05
  • I'm trying to estimate the best fit of several models to a given observation. Different models will have different $\chi^2$ values, where the minimum value is the best fit. Maybe "biased" is the wrong word, I mean this statistic will give more weight to a certain part of a histogram than to another (even though this 'unevenness' will be the same for all models) – Gabriel Jun 10 '12 at 03:44
  • I edited the question to correct what I think is a discrepancy between what you had written and what shows in the paper to which you link. Feel free to roll back if I got that wrong. – EdM Sep 13 '22 at 20:22

1 Answers1

2

Although the authors use the $\chi^2$ symbol for that formula, it's not the $\chi^2$ statistic used, say, for evaluating hypotheses about contingency tables. The authors minimized $\chi^2$ from that formula to optimize model fits and to compare optimized fits among models, rather than to evaluate against a null hypothesis.

They chose to work in a log scale for counts because: "In linear scale, the regions with maximum density would dominate the $\chi^2$ value resulting in an unsatisfactory overall fit."

The effect of $\sigma_i^2=1/n_i$ in the denominators is effectively to weight each term by the number of observed counts in bin $i$, $n_i$,

$$\chi^2 = \sum\limits_{i=1}^N\,\chi_i^2 = \sum\limits_{i=1}^N\,\frac{(\log m_{i}- \log n_{i})^2}{\sigma_{i}^2} = \sum\limits_{i=1}^N\,n_i(\log m_{i}- \log n_{i})^2,$$

for 0 weight on a term when $n_i=0$. That might completely avoid the problem of taking the log of $n_i=0$ for such a term, although I can't be sure that's how the authors proceeded in practice. The modeled counts $m_i$ aren't necessarily integers (even in standard $\chi^2$ statistics), and I suspect that they never were exactly 0.

If there were nevertheless issues with taking the log of 0, that is often faced in practice and discussed on this site. See this page and its links for places to start. For an optimization weighted toward bins with large numbers of observed counts, like in this paper, the choice of what to add before taking the log probably didn't matter much.

EdM
  • 92,183
  • 10
  • 92
  • 267
  • I'd forgotten about this question, thank you Ed! – Gabriel Sep 14 '22 at 18:01
  • 1
    @Gabriel I think that Kjetil Halvorsen is the one who revived this question, by asking you (in a recent but now-deleted comment) to provide the citation that you added yesterday. Glad that I could help with an answer once I saw it revived. – EdM Sep 14 '22 at 18:06