Is generation/evaluation of probabilistic predictions on continuous data feasible for larger data sets in practice?

Question

To better capture uncertainty about the phenomena that we model, probabilistic predictions seem to be a natural and common extension of point predictions.

Methods for evaluation of these predictions are also relatively common as well under the name of scoring rules, one in particular being the Continuous Ranked Probability Score defined as follows for a probabilistic prediction in the form of a CDF, $F(x)$, and a true value, $y \in \mathbb{R}$:

$$\text{CRPS}(F, y) = \int_{\mathbb{R}} (F(x) - \mathbb{1}_{x \geq y}(x))^2 dx$$

I likely have inefficiencies in my implementation (evaluating the full integral utilizing scipy.integrate.quad in Python3) and may have chosen a particularly "bad" case (in terms of runtime complexity) to do static benchmarking with (the empirical cdf of a large sample which likely has many discontinuities), but it seems like this quantity would be cumbersome to evaluate repeatedly for larger data sets (> 10000 data points) especially if the probabilistic forecast is relatively-intensive to re-evaluate?

Is this just one of the sacrifices that we make to have "richer" predictions on larger data sets? or are there other well-documented proper scoring rules for continuous data that lend themselves more to computational efficiency? or is this just more likely a user-based difficulty and would require a re-evaluation/optimization of how I'm choosing to approximate the integral/if I'm doing so correctly?

This is an interesting question. Other examples of proper scoring rules would be the log score and the energy score. There has recently also been some interest in scoring rules for multivariate density prediction. However, I have not seen any work on the efficient evaluation of scores. It may indeed come down to classical numerics about efficient calculation of expectations, with error bounds. (Restricting predictive densities to well-behaved distributional families may help and allow using known machinery, but is of course restrictive.) — Stephan Kolassa, Jul 04 '23 at 09:11
@StephanKolassa Thank you! I have appreciated your answers/questions on here and they have been very insightful :) I have searched for a continuous variant of the log score, but haven't really found a concrete reference. Is it simply, given a probabilistic prediction in the form of a PDF, $f$, and using the convention in my post, $\text{LS}(f, y) = \text{ln}(f(y))$, for a single observation which can then be aggregated for a full set of data through something like the mean of each log score? — QMath, Jul 04 '23 at 10:32
Because while potentially still not being extremely fast ,if the probabilistic prediction is expensive to evaluate at a point, this would in a sense avoid needing to integrate efficiently? — QMath, Jul 04 '23 at 10:34
Yes, exactly! Take a look at section 3.1 in Gneiting & Katzfuss (2014) for the log score and related proper scoring rules. — Stephan Kolassa, Jul 04 '23 at 13:38
Great, thank you very much for the clarification and reference, @StephanKolassa ! When applying these scoring rules to conditional distributions (which I feel like are what we tend to deal with when predicting), taking for example the log loss, is there any modification we need to make to ensure that our forecasts are still fairly penalized given the additional information they utilize? or do we just substitute the conditional density in for our $f$ from before as the scoring rules don't care about the type of prediction — QMath, Jul 05 '23 at 02:24
As in, if we were evaluating a probabilistic prediction $\hat{f_{Y|X}}$ of a true value, $y$, given another value, $x$, would the conditional logarithmic score be as follows? $\text{CLS}(y, \hat{f_{Y|X}} | x) = \text{ln}(\hat{f_{Y|X}}(y | x))$ — QMath, Jul 05 '23 at 03:08
Yes, exactly. You just feed in the predictions, whether conditional or unconditional (one could argue there is no such thing as an unconditional prediction), and compare multiple predictions on whether they yield a better score. ("Better" can be "higher" or "lower" - there are both conventions out there, make sure you are consistent, whatever you use.) Then hope for the best. Proper scoring rules are just guaranteed to be minimized in expectation by the true density, there is zero theory about their finite sample theory. — Stephan Kolassa, Jul 05 '23 at 06:32

Is generation/evaluation of probabilistic predictions on continuous data feasible for larger data sets in practice?

0 Answers0

Linked