'Absolute' benchmarks of model performance on dataset

Question

There are numerous techniques for benchmarking models, e.g. cross-validation and resampling. However, while these can easily be compared in a relative way between implemented models or against a desired accuracy level, is there a way to analyse model performance against a more 'absolute' benchmark for a given dataset?

To clarify, consider we are trying to predict some feature Y based on a set of other measured features X. We fit models to try and capture this relationship, but there's measurement uncertainty in both X and Y, which introduces uncertainties into our model. This will manifest both in the model fit, and in the data we benchmark it against (e.g. a model which exactly reflects the true relationship between X and Y will have some residual differences in our test data due to measurement error).

So if we have a model which performs poorly, it isn't easy to distinguish between i) the model is poor, but other models would perform better; ii) the data uncertainties are too large, but the model would perform better if they were reduced; iii) there is not actually any underlying relationship between X and Y at all.

I've looked at this in some simulated experiments, assuming particular relationships between X and Y and associated uncertainty levels and quantifying model performance against the 'true' relationship we implemented in the simulation. But this feels incomplete as it assumes particular relationships and fitting approaches, so they can't really be used to rule out case (i) above, that we're just not applying the right models. So, I was wondering if there are more absolute methods for benchmarking how good a model fit may be expected in a given dataset, which could be applied to make this analysis a bit more formal and less dependent on the assumption of a particular model.

I appreciate this question may be too general due to the vast space of possible models, but I'd be interested in any suggestions for relevant metrics or approaches to explore, in case there's some techniques I'm overlooking here.

'Absolute' benchmarks of model performance on dataset

0 Answers0

Linked