3

Background

I'm building a regression model on insurance data to predict the losses associated with a policy. I'm running an Optuna optimisation function to help me with this, but I'm struggling with what metric to use to score the model. The metric defines the course of the optimisation, so I want to get it right.

The model I'm currently using is the LGBMRegressor, and the losses are approximately distributed via a gamma distribution.

So far I've used R^2, but I've read that's not a great goodness-of-fit metric.

Question

What metric should I use to score my regression model?

Connor
  • 625

1 Answers1

4

Use the (negative) likelihood of the distribution as your loss function. You can also turn it into a pseudo r2 for easier interpretability ( and negative likelihood is a 1:1 relation to pseudo r2).

I highly recommend xgboostlss / lightgbmlss for that regression task. https://github.com/StatMixedML/XGBoostLSS https://github.com/StatMixedML/LightGBMLSS

Somewhat related: Working in the field of insurance, I have my doubts that policies are gamma distributed without zero-inflation ( does every policy have a 100% chance to pay out >$0). If you actually do have 0 payout I recommend using zero inflated lognormal distributions (and loss). This is a popular choice in customer lifetime value modeling , and transfers very nicely to policy pricing. See here fir details and implementation ( including loss) https://github.com/google/lifetime_value

In xgboostlss world this is the ZALN distribution.

  • Thank you, those githubs look really interesting! Are these tools commonly used in the insurance industry? It looks like they allow you to flexibly create a distribution, is that correct? What do you mean by the "negative likelihood of your distribution"? Does that basically mean try and fit to a distribution and the loss is how poorly your distribution fits? – Connor Feb 07 '24 at 14:38
  • 1
    Yes, that's right. See e.g. eq (3) and (4) int the LTV paper for the zero inflated lognormal negative log likelihood loss. – Georg M. Goerg Feb 07 '24 at 15:31