How do you apply count models to data which is count in nature, but a rate in reality? In such cases, r can handle this to a certain extent, depending on the model, but what is the correct way to model a rate response with count models?
Data & Model
df <- tibble(dependent_rate = c(5.2, 3.4, 7.8, 9.5),
dependent_count = c(5, 3, 7, 9),
pred1 = c(1, 2, 3, 4),
pred2 = c(1, 2, 1, 2),
pred3 = c(1, 1, 2, 2))
glm.nb(dependent_rate ~ pred1 + pred2 + pred3, df)
Model 1 (implemented in R above) throws a warning. Ideally model 2 should be used, but it is unclear how to use dependent_count as the response variable while accounting for the rates.
Therefore my questions/possible solutions to this are:
- Apply weights to model 2 - if so, how would I do this? do I simply add
weights = dependent_ratein the function call? - Add an offset term to model 2 - if so, how? I would like to make predictions with this model, would I need to add an column in
newdatafor my offset term?

predictwith new data, I need to specify the value for 'dependent_count'. This is the response which I am trying to predict and so I cannot specify the value for this in my new data. – Ali Feb 23 '21 at 16:49