Is it so that:
- $y_i$ is not a discrete value, but a range with probability density function
- Which means for the same predictor(s) value $y_i$ could have different results
- In linear regression this distribution can only be normal
- In GLM, this distribution can be any distribution from the exponential family
- distribution of a single $y_i$ has nothing to do with distribution of all $y(s)$
- $\mu_i$ is expected value of $y_i$
- In practical use, $\mu_i$ is the predicted value $y_i$, specially if dataset has only one y for given predictor(s)
Are above correct? Where am I wrong?
Based on the above I've tried simulating glm with lm in R, and it kinda works:
library(boot)
download.file("https://dl.dropbox.com/u/7710864/data/ravensData.rda",
destfile="./ravensData.rda",method="curl")
load("./ravensData.rda")
# download manually and loadhere if above fails
# load("/yourpath/ravensData.rda")
# calling logit(ravensData$ravenWinNum) results in
# [1] Inf Inf Inf Inf Inf -Inf Inf Inf Inf Inf -Inf Inf Inf Inf Inf -Inf
# [17] -Inf -Inf Inf -Inf
# that's way too much, as inv.logit goes to 1 at 20
# so we'll write our own dummy "logit" routine
# this will give us 5 when winNum=1 and -5 when it's zero
win <- ravensData$ravenWinNum*10-5
# now we can do a simple lm
fit <- lm(win~ravensData$ravenScore)
# and get probability of win using inv.logit
fitwin <- inv.logit(fit$fitted.values)
plot(ravensData$ravenScore, fitwin)
# now glm
fitglm <- glm(ravensData$ravenWinNum ~ ravensData$ravenScore, family="binomial")
plot(ravensData$ravenScore,fitglm$fitted)
x^2really isn't doing what you want at all. First, ifxis continuous and you mean x-squared, you need to insulate the^operator from the formula parsing code. Also, you generally wantxandx^2, not justx^2in the formula. Hence, ifxis continuous, you wanty ~ x + I(x^2)to get a 2nd order polynomial. – Gavin Simpson Oct 01 '15 at 19:48yand it is clear thatyis not a proportion between 0 and 1. When I run your codeyhas range (1, 101). The problem is that you aren't making the "mean" of y depend onxbut you are computing some value fromx^2and adding on to it a random Bernoulli observation (0 or 1). This data isn't suitable for the Binomial GLM, in which you want Bernoullia data (0 or 1s) or Binomial counts; the number of successes from M trials. – Gavin Simpson Oct 01 '15 at 19:52