Assume we have a probabilistic forecast for a continuous variable. Now we want to validate how good our estimate was. For that, we can use various scoring rules (e.g. CRPS, logarithmic score) or if we obtained a prediction interval via our probabilistic forecast, we can assess it by using the prediction interval coverage probability (PICP). For scoring rules, we normally assume that lower values indicate a better probabilistic prediction.
For example: if Model A returns CRPSA = 0.5 and Model B returns CRPSB = 1.5, we assume that Model A is better than Model B since CRPSA < CRPSB. However, do we know if CRPSA indicates a "good" value per se? If I would have obtained 0.5 without any reference, like model B, is there any way to tell if it was a "good" performance?
The PICP may help, with that, since we can observe how much the PICP deviates from the assigned probability of the PI. For example, if Model A gives me coverage of 98% (PICP = 98%) for a 90% PI, I know it is a rather bad model. However, a PICP of 90.2% is rather good. For that, I did not necessarily have to compare it with another Model B since the PICP follows an intuitive logic.
However, the PICP also has some disadvantages see here, I am wondering if there is any other metric which could be used for the validation of probabilistic forecasts, that is intuitive in their output. Maybe something comparable to the r 2 or CCC for point predictions.
I am curious to hear your suggestions!