1

I am currently trying to look at how the likelihood of reaching a particular insect life-stage (in this example 1st instar) is influenced by both temperature (factor: 6 levels - 20, 23, 26, 29, 32, 35 degrees Celsius) and species (factor: 2 levels - HA and AP).

Here is an example of what my data looks like

Temperature  Species  No.eggs.added  No.hatched  Prop.egg.to.1st
20           AP                  56          37             0.66
23           AP                  69          61             0.88
26           AP                 139          65             0.47
29           AP                 162          94             0.58

In order to analyze this data I have chosen to run a logistic regression model in R which produced the following summary output

eggmodel <- glm(cbind(No.hatched,No.eggs.added-No.hatched) ~ Temperature * Species, data=eggto1st, family = binomial(link="logit"))
summary(eggmodel)

Call: glm(formula = cbind(No.hatched, No.eggs.added - No.hatched) ~ Temperature * Species, family = binomial(link = "logit"), data = eggto1st)

Deviance Residuals: [1] 0 0 0 0 0 0 0 0 0 0 0 0

Coefficients: Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.6665 0.2822 2.361 0.01821 *
Temperature23 1.3650 0.4702 2.903 0.00369 ** Temperature26 -0.7962 0.3295 -2.416 0.01567 *
Temperature29 -0.3427 0.3240 -1.058 0.29026
Temperature32 -1.4026 0.3250 -4.316 1.59e-05 *** Temperature35 -28.3452 51586.1741 -0.001 0.99956
SpeciesHA 0.7423 0.4391 1.691 0.09092 .
Temperature23:SpeciesHA -1.8376 0.6354 -2.892 0.00383 ** Temperature26:SpeciesHA 0.3990 0.5027 0.794 0.42735
Temperature29:SpeciesHA -1.0553 0.4896 -2.155 0.03115 *
Temperature32:SpeciesHA -1.3219 0.4976 -2.657 0.00789 ** Temperature35:SpeciesHA -0.4598 73007.1005 0.000 0.99999


Signif. codes: 0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 5.5111e+02  on 11  degrees of freedom

Residual deviance: 5.5220e-10 on 0 degrees of freedom AIC: 73.512

Number of Fisher Scoring iterations: 22

I am trying to interpret the results of this model so could anyone please answer the following questions

  1. Is the intercept the log odds of an individual reaching the 1st instar if they are of species AP and kept at 20 degrees Celsius
  2. Are each of the TemperatureX coefficients the difference in the log odds of an individual reaching the 1st instar if they are of species AP kept at temperature X compared to 20 degrees Celsius. As such would you add this to the intercept if you wanted to calculate the log odds of an individual reaching the 1st instar of species AP when kept at temperature X.
  3. Likewise, is the speciesHA coefficient the difference in log odds of an individual reaching the 1st instar at 20 degrees when they are species HA compared to species AP.
  4. How do I interpret the interaction effect coefficients? In order to, for example, get the log odds of reaching 1st instar for species HA at 23 degrees would the formula be intercept + Temperature23 + SpeciesHA + Temperature23:SpeciesHA? Also, what do the p-values for these interaction terms signify?

Any help anyone can provide me with my questions would be greatly appreciated.

1 Answers1

1

Your interpretations of the coefficients are correct. With this type of report format, the (Intercept) is the baseline value, individual coefficients represent the additional contributions from the corresponding predictor values, and interaction coefficients represent the extra contributions of combinations of predictors beyond that from their individual coefficients. With a logistic regression, the units are in log-odds. That seems to be just what you wrote as your interpretations.

The p-values for all coefficients are for differences of the estimated values from 0. For an interaction coefficient, that represents whether there's a "statistically significant" interaction between the predictors in relationship with outcome.

With multi-level categorical predictors it's usually best to evaluate all related terms at once, either by comparing nested models with likelihood-ratio tests or performing Wald tests on multiple coefficients (e.g., the Anova() with capital "A" function in the R car package). For example, you could compare a model without the interaction term to one with the interaction term to see if the interaction between species and temperature is significant overall.

A thought on the modeling

It looks like the 35-degree condition is close to lethal, leading to enormous standard errors in the corresponding coefficient estimates. That also might be why your model is taking so long to converge (22 iterations). See the discussion on perfect separation in logistic regression. Although your model overall doesn't suffer from that, it looks like the 35-degree condition comes close. Consider whether you gain anything by including those values in the model. Keeping them in certainly will make it impossible to do reliable tests that involve combinations of coefficients, like those I suggested above.

EdM
  • 92,183
  • 10
  • 92
  • 267
  • Thank you for clarifying my understanding of the coefficients. I think it might be best I remove the 35-degree level from temperature. If, from the results of a LRT, I wanted to gain within temperature species comparisons and within species temperature comparisons are there any post-hoc tests specific to logistic regression that I could use to gain this information. – Insect_biologist Nov 15 '22 at 16:52
  • 1
    @Ladybird_biologist post-modeling software typically can take a logistic regression model and a specification of the comparisons you want to make, then report the model predictions in a probability scale, with confidence intervals and p-values. There's nothing particularly "specific to logistic regression" except for converting from logit to probability scale at the end. The car package noted in the answer has such tools, for example its linearHypothesis() function. The emmeans package is often used for post-modeling analysis. – EdM Nov 15 '22 at 17:52
  • 1
    @Ladybird_biologist it looks like some of your recent questions on this site might be addressed by using the emmeans package for post-model analysis. It works on a very wide range of model types. It can seem daunting when you first try to use it, but there are several vignettes describing its use, tutorials on line, and an author who often responds to coding questions on Stack Overflow and to statistical questions on this site. – EdM Nov 15 '22 at 18:05