1

enter image description here

I investigated the effect of weather variables on disease severity. My response variable is proportion of disease severity observed in different years. The study is conducted over 10 year and disease was assessed 3-6 times a year, but only final disease assessment has been considered. The response variable is continuous, positive with minimum value of 0.35 and a maximum value of 0.919. I have tried both binomial and beta regression, and beta regression seems to be making more sense and results are supported by the literature.

However, the model diagnostics are all over the place and doesn't show a good fit, although results are logical. I checked autocorrelation with performance package, and there is no significant autocorrelation. But when I checked concurvity, there is strong collinearity between variables with a value of 1 for all variables in worst category. This seems to be the problem.

concurvity(mod5)
             para s(mean_rh) s(total_rain) s(mean_temp) s(mean_ws)
worst    0.978845  1.0000000     1.0000000            1          1
observed 0.978845  0.9624381     0.7694716            1          1
estimate 0.978845  0.9594373     0.7116748            1          1

Does anyone know how to account for collinearity between variables? I am attaching model fit as well as diagnostic plots, any help will be very appreciated.

My model is given below.

mod5 <- gam(severity ~ s(mean_rh, k = 5) + s(total_rain, k = 6)  + 
          s(mean_temp, k=3) + s(mean_ws, k= 3), family = betar(), 
          data = dat_seasonal)

summary(mod5)

enter image description here

Ahsk
  • 405
  • I would really like to use beta regression because the results seem to be making sense, but the diagnostics plots are my only concern – Ahsk Oct 11 '22 at 01:41
  • You seem to have one very serious outlier. See what's going on with that first. I do worry that you might be overfitting the data--looks like only a dozen or so data points, with 4 predictors. Even the penalization implicit in the smooths might not be enough to avoid overfitting. Note that multicollinearity isn't necessarily a problem, particularly if you're interested in prediction. See this page, for example. – EdM Oct 11 '22 at 15:49
  • Thanks for your comment. Yes, there is an outlier, which is the year 2014 where 500 mm rain occurred. The second highest value is 328 mm. But removing 2014 makes results illogical and diagnostic plots almost remain the same (except that one outlier disappears). My current results are supported by the literature but I am just worried about the diagnostic plots, as a reviewer might point them out. – Ahsk Oct 11 '22 at 18:22

0 Answers0