I am fairly new to R and multiple regression analyses so I could use some help interpreting my results. For my research I am trying to find predictors for the amount of blood loss during surgery. For this I have a dataset of clinical variables (which are either dichotomous, ordinal or continuous) and blood loss as an outcome in mL. As blood loss is a non-normally distributed continuous variable with only positive values, I understood I am best off using a generalized linear model with Gamma regression. To build the model I used the following code in R:
fullmodel <- glm(ebl ~ embol + age + gender + bmi + charlson +
path_fracture + pain + ecog + asia_pre + prim_tumor +
other_bone_mets + spine_mets + visc_mets + brain_mets +
local_radiation + previous_systemic + ellipsoid_cm3 + bilsky +
hgb + wbc + plt + lymph + neut + creatinine + calcium + albumin +
time_prim_surg + operation + levels_operated + opn_time_min,
data = predictebl, family= Gamma(link="log"))
summary(fullmodel)
simulationOutput2 <- simulateResiduals(fittedModel = fullmodel)
plot(simulationOutput2)
testDispersion(simulationOutput2)
Which gave me the following results:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.431e+00 1.836e+00 3.502 0.000925 ***
embol 2.164e-01 2.221e-01 0.974 0.334304
age -1.085e-02 1.037e-02 -1.046 0.300070
gender2 -4.557e-01 2.467e-01 -1.847 0.070122 .
bmi -3.737e-03 1.943e-02 -0.192 0.848185
charlson 4.448e-02 7.829e-02 0.568 0.572239
path_fracture2 -8.190e-03 2.231e-01 -0.037 0.970845
pain2 -4.070e-01 2.723e-01 -1.495 0.140720
ecog2 2.781e-01 2.607e-01 1.067 0.290690
ecog3 2.113e-01 3.331e-01 0.634 0.528553
ecog4 -3.904e-01 3.878e-01 -1.007 0.318482
ecog5 -4.656e-01 5.573e-01 -0.836 0.407017
asia_pre2 1.531e-01 2.053e-01 0.746 0.459062
asia_pre3 -6.207e-01 5.248e-01 -1.183 0.241970
asia_pre4 2.041e+00 9.495e-01 2.150 0.035982 *
prim_tumor2 -7.277e-02 2.600e-01 -0.280 0.780568
other_bone_mets2 5.886e-02 2.077e-01 0.283 0.777901
spine_mets2 -4.733e-01 2.679e-01 -1.767 0.082828 .
spine_mets3 5.052e-02 2.602e-01 0.194 0.846752
visc_mets2 -7.395e-01 2.092e-01 -3.535 0.000836 ***
brain_mets2 5.740e-02 3.288e-01 0.175 0.862053
local_radiation2 -1.631e-01 2.011e-01 -0.811 0.420737
previous_systemic2 2.151e-01 2.441e-01 0.881 0.382131
ellipsoid_cm3 2.866e-03 2.875e-03 0.997 0.323289
bilsky -1.202e-02 6.576e-02 -0.183 0.855612
hgb 5.356e-03 7.193e-02 0.074 0.940917
wbc 6.318e-02 5.073e-02 1.245 0.218291
plt 6.412e-05 1.000e-03 0.064 0.949134
lymph -1.235e-01 1.947e-01 -0.634 0.528633
neut -8.864e-02 5.940e-02 -1.492 0.141316
creatinine -3.602e-02 1.339e-01 -0.269 0.788925
calcium 9.560e-02 8.701e-02 1.099 0.276667
albumin -2.514e-01 1.932e-01 -1.301 0.198718
time_prim_surg 4.495e-05 5.097e-05 0.882 0.381705
operation2 3.100e-01 2.313e-01 1.340 0.185614
operation3 -1.084e+00 4.389e-01 -2.470 0.016640 *
operation4 -2.784e-01 4.373e-01 -0.637 0.527054
levels_operated2 3.685e-01 2.811e-01 1.311 0.195342
levels_operated3 4.108e-01 2.892e-01 1.421 0.161058
opn_time_min 3.619e-03 6.433e-04 5.625 6.44e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Gamma family taken to be 0.5311014)
Null deviance: 117.807 on 94 degrees of freedom
Residual deviance: 36.493 on 55 degrees of freedom
AIC: 1586.4
Number of Fisher Scoring iterations: 14
I also used DHARMa to create some plots to visualize the model results:
Now I have a few questions regarding the results of my analysis:
[Q1]: was it appropriate to add all variables of the dataset into the model, or should I have done another analysis first to select variables that are associated with blood loss? I have read about LASSO, should I have used that before the GLM?
[Q2]: to me the Q-Q plot and associated tests seem fair, but the second plot states that quantile deviations were detected. Is this problematic for the model, and if so, how should I fix this?
[Q3]: I would like to present the results of my analysis in a table displaying the factors that are associated with blood loss, what would the best approach to this be? Should I just write down the name of the variable, together with the factor in row 1 and the p-value?

