13

My predictions coming from a logistic regression model (glm in R) are not bounded between 0 and 1 like I would expected. My understanding of logistic regression is that your input and model parameters are combined linearly and the response is transformed into a probability using the logit link function. Since the logit function is bounded between 0 and 1, I expected my predictions to be bounded between 0 and 1.

However that's not what I see when I implement logistic regression in R:

data(iris)
iris.sub <- subset(iris, Species%in%c("versicolor","virginica"))
model    <- glm(Species ~ Sepal.Length + Sepal.Width, data = iris.sub, 
                family = binomial(link = "logit"))
hist(predict(model))

enter image description here

If anything the output of predict(model) looks normal to me. Can anyone explain to me why the values I get are not probabilities?

Adrian
  • 131
  • 3
  • 3
    Corone's answer below covers the details very nicely. The original figure you have above presents the log-odds values on the x-axis, which can be mathematically transformed to probabilities (i.e. as per Corone's answer, by passing back through the link function.) – James Stanley Feb 03 '13 at 23:01

1 Answers1

17

The predict.glm method by default returns the predictors on the scale of the linear predictor. I.e. they haven't gone through the link function yet.

Try

hist(predict(model, type = "response"))

instead

enter image description here

Corvus
  • 5,345
  • 5
    You have done a great job mastering our markup and illustration capabilities in a short time: this answer is a nice example of that. Well done! – whuber Feb 04 '13 at 16:25