I have a dataset that I have splitted for training - testing purpose (~2400 - ~600) .
After training an XGBoost regression model on the training set, I would like to statistically test the fact that the model is indeed better on the test set than a naïve model that would just always output the average of the training set output factor.
Residuals (of both predictions), residuals difference (between both predictions) or squarred residuals difference(between both predictions) are all not normal (tested with ShapiroWilk) or not even pseudo-normal. So I think I should use a non parametric test, maybe something like Mann–Whitney U test, but if I get it right, this particular test can't solve my problem.
Except if I am wrong, this problem would be almost isomorphic of prooving that a sort of R² of the model is not null.
y_pred = trained.predict(X_test)
baseground = [y_train.mean()] * len(y_pred)
mse = mean_squared_error(y_test, y_pred)
baseground_mse = mean_squared_error(y_test, baseground)
Model mse: 0.3606197869571293
Baseground mse: 0.42712517870196076
residuals = [y_p - y_t for y_p, y_t in zip(y_pred, y_test)]
squarred_residuals = [(y_p - y_t)**2 for y_p, y_t in zip(y_pred, y_test)]
residuals_naive = [(y_b - y_t) for y_b, y_t in zip(baseground, y_test)]
squarred_residuals_naive = [(y_b - y_t)**2 for y_b, y_t in zip(baseground, y_test)]
residuals_differences = [(r - rn) for r, rn in zip(residuals, residuals_naive)]
squarred_residuals_differences = [sr - srn for sr, srn in zip(squarred_residuals, squarred_residuals_naive)]
residuals: ShapiroResult(statistic=0.9887793660163879, pvalue=4.4769913074560463e-05)
residuals_naive: ShapiroResult(statistic=0.8029045462608337, pvalue=5.698079444623409e-28)
residuals_differences: ShapiroResult(statistic=0.9659961462020874, pvalue=1.8076425772894922e-11)
squarred_residuals_differences: ShapiroResult(statistic=0.8816245794296265, pvalue=2.258187421269406e-22)
r2 = r2_score(y_test, y_pred)
R² = 0.1557046857947879
This Crossvalidated thread may be related
T.L;D.R How to statistically test that for two paired non normal distributions d1, d2 that mean(d1) > mean(d2)