2

I'm currently reading through this GLM textbook, and have come across this assertion on page 24 that I can't quite wrap my head around

enter image description here

The author claims if the expected value $\theta_i=E[Y_i]$ of Poisson distributed data from independent Random Variables $Y_i$. How is this true? Is there an assumption they are making? I initially thought it may have something to do with CLT, but I'm not completely sure.

John
  • 21
  • key word is "provided expected values not too small". You should try simulating Poisson random variables with larger and larger expected values and plotting histograms of them :) – John Madden Sep 20 '22 at 17:09

3 Answers3

2

As MikePhifer notes correctly, the caveat is that the expected mean must be "not too small" (in practice, I would say at least > 10 or so). In this case, the Poisson is approximately normal, and so are the residuals; however, if that is the case, you might as well fit a log-transformed lm.

In all cases where you really need a Poisson GLM, standardised residuals will be non-normal. However, you can use the idea of quantile residuals, e.g. the DHARMa package in R, to transform to uniformity (or other distributions, including normality, see options in residuals.DHARMa)

1

For the sake of simplicity, let's assume that the data really are conditionally Poisson and that $\hat{\theta}_i$ really is the true conditional mean ($\lambda_i$) everywhere. In that case, the variance at every point is $\lambda_i^2 = \hat{\theta}_i^2$. If you divide by the square root of the expected value, you are dividing by the conditional standard deviation and because that value is keyed to each individual conditional mean, you are stabilizing the variance.

So why wouldn't it be "approximately" normal (admitting that the hand-waviness of that carries some of the weight)? Well, the normal distribution goes to infinity in both directions, but the Poisson distribution only goes to positive infinity and is bounded from below by $0$. More specifically, the Poisson distribution is skewed, while the normal is not. However, as $\lambda$ increases, the Poisson distribution becomes less skewed (at the rate of $^1/_{\sqrt{\lambda}}$). So when "$\hat{\theta}_i$ are not too small", that becomes less and less of an issue. In addition, it becomes less and less likely that any data would be in the vicinity of the lower bound, as it is so far away.

The other issue is that the Poisson distribution is discrete, whereas the normal is continuous, so there are 'gaps' in the distribution. First, note that any finite sample will have gaps, no matter how finely measured (and even if the data are drawn from a true normal). Second with higher $\lambda$ there will be many possible values that are meaningfully likely to occur and they are closer together (relatively, albeit not absolutely). Moreover, as there are increasing numbers of conditional distributions, there will be a larger number of possible standardized residual values, which would make the distribution more continuousish. Putting all of the above together, the claim is not too surprising.

0

John, good question and welcome (I see both of us are new here). I think it might be related to the Poisson Normal approximation (ballpark est.) with the general rule of thumb of if Poisson mean count (rate) is 5 or more. Normal Mean ~ Poisson expected Count, and Normal Standard Deviation (sigma) ~ Sqrt(Poisson Expected Count). Combine this with a standard normal Z calculation (Z=(x-mean)/sigma) and you get the residual Ri equation.

Here is a Monte Carlo for Ri with Poisson Rates: 0.1, 1, 5, and 20. Note that 0.1 and 1 are not well modeled by this, 5 is only roughly modeled and for Poisson(20) the model is better/ reasonable. Probability (CDF) Plot Assuming Normal Distribution for Ri Poisson Normal CDF Plot