Recall that an offset is just a predictor variable whose coefficient is fixed at 1. So, using the standard setup for a Poisson regression with a log link, we have:
$$\log \mathrm{E}(Y) = \beta' \mathrm{X} + \log \mathcal{E}$$
where $\mathcal{E}$ is the offset/exposure variable. This can be rewritten as
$$\log \mathrm{E}(Y) - \log \mathcal{E} = \beta' \mathrm{X}$$
$$\log \mathrm{E}(Y/\mathcal{E}) = \beta' \mathrm{X}$$
Your underlying random variable is still $Y$, but by dividing by $\mathcal{E}$ we've converted the LHS of the model equation to be a rate of events per unit exposure. But this division also alters the variance of the response, so we have to weight by $\mathcal{E}$ when fitting the model.
Example in R:
library(MASS) # for Insurance dataset
# modelling the claim rate, with exposure as a weight
# use quasipoisson family to stop glm complaining about nonintegral response
glm(Claims/Holders ~ District + Group + Age,
family=quasipoisson, data=Insurance, weights=Holders)
Call: glm(formula = Claims/Holders ~ District + Group + Age, family = quasipoisson,
data = Insurance, weights = Holders)
Coefficients:
(Intercept) District2 District3 District4 Group.L Group.Q Group.C Age.L Age.Q Age.C
-1.810508 0.025868 0.038524 0.234205 0.429708 0.004632 -0.029294 -0.394432 -0.000355 -0.016737
Degrees of Freedom: 63 Total (i.e. Null); 54 Residual
Null Deviance: 236.3
Residual Deviance: 51.42 AIC: NA
# with log-exposure as offset
glm(Claims ~ District + Group + Age + offset(log(Holders)),
family=poisson, data=Insurance)
Call: glm(formula = Claims ~ District + Group + Age + offset(log(Holders)),
family = poisson, data = Insurance)
Coefficients:
(Intercept) District2 District3 District4 Group.L Group.Q Group.C Age.L Age.Q Age.C
-1.810508 0.025868 0.038524 0.234205 0.429708 0.004632 -0.029294 -0.394432 -0.000355 -0.016737
Degrees of Freedom: 63 Total (i.e. Null); 54 Residual
Null Deviance: 236.3
Residual Deviance: 51.42 AIC: 388.7
exp()requires the pre-calculated log(number at risk) as per @DWin's answer; whileoffset()takes the original (number at risk) as an input, but calculates the log of this for you before using in the model. So they return the same results, but require different forms of input. – James Stanley Aug 08 '13 at 21:53offset()and requires the log transform of the variable to be passed to the model. [Sorry for additional confusion in my attempt to reduce confusion -- in other words,offsetin R corresponds toexpin Stata from a syntax point of view...] – James Stanley Aug 08 '13 at 22:51exposurebut one can shorten this toexporeor anything in-between! (and as you'd expect,expused as a simple function, rather than an option as to thepoissoncommand here, returns the exponential -- e.g.disp exp(1)prints 2.7182818 to screen) – James Stanley Aug 08 '13 at 23:57offset()in R takes the original number as an input and then takes the log for you, but your second comment says thatoffset()in R requires the log transform of the variable to be passed to the model. Your second comment seems consistent with Hong Ooi's answer. When I try this on my own, R throws an error if I supply the raw number as the offset, but it works if I supply the log(number). Is there any situation where using the raw number as the offset is correct? – gannawag Jan 26 '17 at 19:46offset(number)option is equivalent to calling R withoffset=log(number)– James Stanley Jan 26 '17 at 20:11