0

I've ran a GLM with bSpline fit on some variables and I'm having trouble extracting the raw coefficient associated with those variables to give the customer an easily understandable effect.

My formula is as follows.

       fit <- glm(freq ~ State_bucket + eff_year + channel + marital_status 
       + usage + term + pay_plan + bSpline(age, degree = 1, knots = c(27, 
       70)) + vehicle_type +bSpline(vehicle_age, degree = 1, knots = c(6, 
      17)) + bSpline(RBA, degree = 1, knots = c(2750,26750)) +
      bSpline(vehicle_length, degree= 1, knots = c(45)) +
      bSpline(Credit, degree = 1, knots = c(375)), family = 
      quasipoisson(link="log"), data = spline_training_set)

For the sake of simplicity, I'll only focus on the age variable. The coefficient output is as follows:

spline coefficient

I understand this is the coefficient for the 1st degree polynomial of age, but I need to be able to give the raw coefficient for each age in order to know how much the frequency moves relative to age changing.

Does anyone out there know how to get the interpretable coefficients out of the spline? possibly in r?

Jordan
  • 235
  • "I need to be able to give the raw coefficient for each age in order to know how much the frequency moves relative to age changing." you can do this with the predict command over a range of ages. – AdamO Dec 13 '17 at 18:38
  • Thanks @AdamO. Unfortunately my internal customer uses the raw coefficients as factors to price by instead of the final prediction. The estimate given above is actually the slope of the polynomial which doesn't help. – Jordan Dec 15 '17 at 11:57
  • I want to push you on this point a bit. If the client is asking for raw coefficients, it's important to check-in on their interpretation of such values. They are an expected difference comparing two adjacent predictions. Spline adjustment differs because the mean difference varies over the domain of the predictor. The actual numerical output is a very complicated scaled, derived representation of these values. Alone they are not interpretable. Predict two adjacent categories over key points in the domain to get the analogue of a "raw coefficient". Zelig in R can calculate 95% CIs. – AdamO Dec 15 '17 at 14:41
  • I apologize for the late response @AdamO. I didn't get an email telling me there was a response. Thank you for coming back to me. Unfortunately I'm not quite sure I understand. Let's say for instance, I'm looking at age with records from 21 to 60 by 1. I have a knot at 30, 40, and 50. The "black box" program we use now will give us the coefficient at each age which will be different (assuming ages are not binned together) regardless of the polynomial slope. That's what I'm trying to get at so I can tell my customer to price age by "1.21 for age 21 and 1.18 for age 22, etc, etc.) – Jordan Dec 26 '17 at 19:37
  • I recently made a post about recoding splines to enhance interpretation. Give this a read and see if it answers your question: https://stats.stackexchange.com/questions/225653/periodic-splines-to-fit-periodic-data/319760#319760 – AdamO Dec 26 '17 at 20:01

0 Answers0