Binomial logistic regression warning in R: fitted probabilities numerically 0 or 1 occurred

Question

Overview

I got the following error when I ran a binomial logistic regression with the glm function in R:

Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred

Therefore, I am wondering if there is something inherent in the structure of my data that is causing this issue.

I have observation data on two species, Species 1 and Species 2 (response variable), and ecological data for these two species, including foraging height, attack maneuver, foliage density, and foraging substrate (predictor variables). My goal is to infer what predictor variables contribute to the ecological differences between those species.

Input data 'niche_data_recoded.csv':

Species	Attack_maneuver	Foraging_height	Foraging_substrate	Foliage_density
Species_1	1	3	1	3
Species_1	2	1	2	1
Species_1	1	3	1	3
Species_1	1	3	1	3
Species_1	3	5	3	3
Species_1	1	4	1	3
Species_1	3	4	3	2
Species_1	2	1	2	1
Species_1	2	1	2	1
Species_1	1	4	1	1
Species_2	1	1	1	1
Species_2	2	1	2	1
Species_2	2	1	2	1
Species_2	1	1	1	1
Species_2	2	1	2	1
Species_2	1	3	1	1
Species_2	2	1	2	1
Species_2	2	1	2	1
Species_2	2	1	2	1
Species_1	2	1	2	1
Species_1	2	1	2	1
Species_1	2	1	2	1
Species_1	2	1	2	1
Species_1	1	5	1	2
Species_1	1	2	1	2
Species_1	1	3	1	3
Species_1	2	1	2	4
Species_1	1	1	1	1
Species_2	2	1	2	1
Species_2	2	1	2	1
Species_2	2	1	2	1
Species_1	1	2	1	1
Species_1	2	1	2	1
Species_1	2	1	2	1
Species_1	2	1	2	1
Species_1	2	1	2	1
Species_1	2	1	2	1
Species_1	2	1	2	1
Species_1	1	1	1	1
Species_1	1	1	1	1
Species_1	2	1	2	1
Species_1	1	5	1	1
Species_1	1	5	1	2
Species_1	3	5	3	3
Species_1	1	5	1	1
Species_1	1	4	1	1
Species_1	1	5	1	1
Species_1	1	4	1	1
Species_1	1	3	1	1
Species_1	2	1	2	2
Species_1	2	1	2	1
Species_1	2	1	2	1
Species_1	2	1	2	1
Species_1	1	1	1	3
Species_1	1	1	1	1
Species_1	1	1	1	2
Species_1	2	1	2	1
Species_1	1	1	1	1
Species_2	2	1	2	1
Species_2	1	1	1	1
Species_2	2	1	2	1
Species_2	2	1	2	1
Species_2	1	1	1	1
Species_2	2	1	2	1
Species_2	1	1	1	1
Species_1	2	1	2	1
Species_1	1	3	1	2
Species_1	1	1	1	1
Species_1	1	3	1	1
Species_1	1	5	1	1
Species_1	2	1	3	1
Species_1	2	1	3	1
Species_1	2	1	3	1
Species_1	1	1	1	1
Species_1	2	1	2	1
Species_1	1	2	1	2
Species_1	2	1	2	2
Species_1	1	3	1	2
Species_1	1	1	1	2
Species_1	1	1	1	2
Species_1	3	5	3	4
Species_1	2	1	2	3
Species_2	1	1	1	1
Species_2	2	1	2	1
Species_2	1	1	1	1
Species_2	2	1	2	1
Species_2	2	1	2	1
Species_2	1	3	1	1
Species_2	2	1	2	1
Species_2	2	1	2	1
Species_2	2	1	2	1
Species_2	2	1	2	1
Species_2	2	1	2	1
Species_2	2	1	2	1
Species_2	1	2	1	1
Species_2	1	3	1	1
Species_2	2	1	2	1
Species_2	1	2	1	1
Species_2	2	1	2	1
Species_2	1	1	1	1
Species_2	2	1	2	1
Species_2	2	1	2	1
Species_2	1	1	1	1
Species_2	2	1	2	1
Species_2	2	1	2	1
Species_2	2	1	2	1
Species_2	2	1	2	2
Species_2	2	1	2	1
Species_2	2	1	2	1
Species_2	2	1	2	1
Species_2	2	1	2	1
Species_2	2	1	2	1
Species_2	2	1	2	1
Species_2	2	1	2	1

Code:

niche.dat <- read.csv('niche_data_recoded.csv',header=T,na.strings=c(""))
cols <- c("Species", "Attack_maneuver", "Foraging_height", "Foraging_substrate", "Foliage_density")
niche.dat[cols] <- lapply(niche.dat[cols], factor)
str(niche.dat)
model <- glm(Species ~ Attack_maneuver + Foraging_height + Foraging_substrate + Foliage_density,family=binomial,data=niche.dat)
summary(model)

Output:

> model <- glm(Species ~ Attack_maneuver + Foraging_height + Foraging_substrate + Foliage_density,family=binomial,data=niche.dat)
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred 
> summary(model)
Call:
glm(formula = Species ~ Attack_maneuver + Foraging_height + Foraging_substrate + 
    Foliage_density, family = binomial, data = niche.dat)
Deviance Residuals: 
     Min        1Q    Median        3Q       Max

-1.45706  -1.25836  -0.00008   0.92146   2.04286
Coefficients: (1 not defined because of singularities)
                      Estimate Std. Error z value Pr(>|z|)

(Intercept)            0.18833    0.49204   0.383   0.7019

Attack_maneuver2     -18.75440 3765.84720  -0.005   0.9960

Attack_maneuver3       4.80062 5754.42487   0.001   0.9993

Foraging_height2       0.21481    1.17207   0.183   0.8546

Foraging_height3       0.07142    0.98094   0.073   0.9420

Foraging_height4     -19.31781 4991.11811  -0.004   0.9969

Foraging_height5     -17.73391 1922.24348  -0.009   0.9926

Foraging_substrate2   19.20303 3765.84717   0.005   0.9959

Foraging_substrate3         NA         NA      NA       NA

Foliage_density2      -2.59109    1.10344  -2.348   0.0189 *
Foliage_density3     -18.30482 1988.67327  -0.009   0.9927

Foliage_density4     -18.37471 4310.78632  -0.004   0.9966

Signif. codes:  0 ‘*’ 0.001 ‘’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 156.77  on 113  degrees of freedom

Residual deviance: 111.63  on 103  degrees of freedom
AIC: 133.63
Number of Fisher Scoring iterations: 17

I'm also not sure what the NAs mean in this situation.

You have very very large standard errors which could mean you have problems with quasi complete or complete separation. — Demetri Pananos, Aug 30 '22 at 01:12
Note also that many coefficients are nearly identical but have opposite sign, meaning which may indicate problems with Collin rarity between variables. I can’t check your data right now, perhaps tomorrow, but these could be two probable sources. — Demetri Pananos, Aug 30 '22 at 01:13
For Species_2, Foliage_density has the value of 1 for 50 out 51 cases. One possible cause leading to the warning. — Dave2e, Aug 30 '22 at 01:35
@Dave2e There is no warning after removing that predictor variable. But is removing that variable the best approach or is there a solution for keeping it in the model? Alternatively, is the warning safe to ignore? I'm also still not sure why there are NAs for Foraging_substrate3. — user12167116, Aug 30 '22 at 04:16
There's some really good discussion of (quasi)complete separation at the following linked question: https://stats.stackexchange.com/questions/11109/how-to-deal-with-perfect-separation-in-logistic-regression — James Stanley, Aug 30 '22 at 04:51

Sextus Empiricus · Answer 1 · 2022-08-30T07:07:02.660

The foraging substrate category 3 has 100% species_1 as response. This makes that you can not model it well with the logit link function. That is because the fitting/modelling will try to make an estimate equal to one, $\hat{p} = 1/(1+e^{-X\hat\beta}) = 1$, and this will be associated with an infinite coefficient.

Binomial logistic regression warning in R: fitted probabilities numerically 0 or 1 occurred

1 Answers1