4

Let's consider this very simple example with Poisson regression:

y <- c(1, 2, 3, 2, 1, 4, 5, 6, 7, 1, 2, 3)
x <- rnorm(length(y))
z <- runif(length(y))

mod <- glm(y ~ x + z, family = poisson)

As we can observe, the null deviance is equal to:

mod$null.deviance
14.10487

However, my manual calculations (based for example on RPubs) shows something different:

# Estimate null model
mod_null <- glm(y~1, family=poisson)
2 * (logLik(mod) - logLik(mod_null))

'log Lik.' 5.812325 (df=3)

Can you please explain to me what am I doing wrong?

Lucian
  • 203

2 Answers2

6

There is already a good answer but I thought we could still clarify some points.

Also the Note at the end shows the input we will use. It is the same as in the question except we added a seed to make the data and models reproducible.

Let $L$ and $D$ be the likelihood and the deviance of our model, respectively, and use a subscript of $0$ or $s$ to describe the corresponding quantities for the null or saturated models. Then the code in the question calculates the null deviance, i.e. the deviance of the null model, as $2 (L - L_0)$ but the correct formula is $D_0 = -2 (L_0 - L_s)$

In terms of R code we have the following where, in R, gl(n,1) means a factor with n levels assuming n is the length of y. In the link given in the question it is written as as.factor(1:length(y)) .

n <- length(y)
identical(as.factor(1:length(y)), gl(n, 1))
## [1] TRUE

Now calculating the saturated model and using the correct formula from above we have:

mod <- glm(y ~ x + z, family = poisson)   # model - same as in question

mod_null <- glm(y~1, family = poisson) # null model - same as in question

n <- length(y) mod_sat <- glm(y ~ gl(n, 1), family = poisson) # saturated model

mod$null.deviance

[1] 14.10487

D0 <- c(- 2 * (logLik(mod_null) - logLik(mod_sat))); D0 # null deviance

[1] 14.10487

Note

set.seed(123)
y <- c(1, 2, 3, 2, 1, 4, 5, 6, 7, 1, 2, 3)
x <- rnorm(length(y))
z <- runif(length(y))
3

Expanding from my comment above, you are not comparing the right quantities. The null deviance is given by 2(LL(Saturated Model) - LL(Null Model)).

Let's calculate the null deviance manually and compare with mod$null.deviance.

get_null_deviance <- function(fit) {
# Get data from fit object and create new `.idx` column for
# saturated model
data &lt;- transform(fit<span class="math-container">$model, .idx = factor(1:nrow(fit$</span>model)))

# Get response variable from fit object
# In this case, this returns `&quot;y&quot;`
resp &lt;- all.vars(attr(fit$model, &quot;terms&quot;))[1]

# Formula for saturated model and fit model
# In this case, the saturated model is `y ~ .idx`
fm_saturated &lt;- reformulate(&quot;.idx&quot;, response = resp)
fit_saturated &lt;- glm(fm_saturated, data = data, family = poisson)

# Formula for null model and fit model
# In this case, the null model is `y ~ 1`
fm_null &lt;- reformulate(&quot;1&quot;, response = resp)
fit_null &lt;- glm(fm_null, data = data, family = poisson)

# Return null deviance
2 * (logLik(fit_saturated) - logLik(fit_null))

}

get_null_deviance(mod) #'log Lik.' 14.10487 (df=12) mod$null.deviance #[1] 14.10487

Maurits Evers
  • 320
  • 1
  • 8
  • 1
    It would be great if you can recode this to not use reformulate but explicit model formulas. This might be hard to follow for anyone who doesn't know a lot of R yet. – dipetkov Aug 27 '22 at 10:01
  • 1
    @dipetkov Thanks for the suggestion. I have expanded my post to include explicit comments to help with understanding the R code. I do think including a more general solution in R is more valuable than hard-coding model formulas. This question came from SO, and IMHO it was a suitable question for SO. – Maurits Evers Aug 27 '22 at 10:46
  • Thanks for adding comments but I disagree. A more general solution is not needed at all because we have mod$null.deviance. It's always better to use tried and tested implementation than implementing our own. Understanding what the tried and tested implementation calculates is valuable. – dipetkov Aug 27 '22 at 11:28
  • @dipetkov I guess we have to agree to disagree. This is not about "implementing my own"; instead this is a code-based explanation for calculating the null deviance based on a fitted model. OP uses R; OP has researched methods in R; the answer shows OP how to use R to calculate the null deviance. – Maurits Evers Aug 27 '22 at 11:39
  • 1
    I think it may actually be about whether this question was appropriately migrated or not... In any case the OP is free to ask followup questions about either the code or the theory, should they choose to do so. – dipetkov Aug 27 '22 at 11:42