Comparing the fit of Tweedie and gamma hurdle models

Asked Jan 16 '21 at 18:24

Active Mar 21 '24 at 18:12

Viewed 515 times

I am comparing several modeling approaches to semi-continuous data (many exact zeros and continuous positive cost outcomes) to assess the effect of the main predictor "disease" on cost.

The models I'm comparing are a Tweedie model and a gamma hurdle model.

The Tweedie model is in the form:

glm(cost ~ disease + age + gender + offset(log(days_in_study)), 
    data = df, family = tweedie(link.power = 0, var.power = xi.max)

The hurdle model is in the form:

binary component:

glm(cost ~ disease + age + gender + offset(days_in_study), 
    data = df, family = "binomial")`

continuous component:

glm(cost ~ disease + age + gender + offset(log(days_in_study)), 
    data = subset(data, cost > 0), family = Gamma(link = "log"))

How do I compare these models to choose between them? The Tweedie model indicates that the variable "disease" is strongly predictive of costs. The hurdle model indicates that it is not. So while I know no model is per se "right," my choice of model determines the entire outcome. Common measures like the AIC don't seem to be well-defined for two-part models, per various CrossValidated posts.

edited Mar 21 '24 at 18:12

kjetil b halvorsen

77,844

asked Jan 16 '21 at 18:24

Kellan Baker

You could try to compare the models based on some measure of out of sample prediction error. – JTH Jan 16 '21 at 21:55
@JTH would I use cross-validation for this? – Kellan Baker Jan 16 '21 at 23:35
1

Yes. Cross validation is a good option. – JTH Jan 16 '21 at 23:49

Comparing the fit of Tweedie and gamma hurdle models

0 Answers0

Linked