To better capture uncertainty about the phenomena that we model, probabilistic predictions seem to be a natural and common extension of point predictions.
Methods for evaluation of these predictions are also relatively common as well under the name of scoring rules, one in particular being the Continuous Ranked Probability Score defined as follows for a probabilistic prediction in the form of a CDF, $F(x)$, and a true value, $y \in \mathbb{R}$:
$$\text{CRPS}(F, y) = \int_{\mathbb{R}} (F(x) - \mathbb{1}_{x \geq y}(x))^2 dx$$
I likely have inefficiencies in my implementation (evaluating the full integral utilizing scipy.integrate.quad in Python3) and may have chosen a particularly "bad" case (in terms of runtime complexity) to do static benchmarking with (the empirical cdf of a large sample which likely has many discontinuities), but it seems like this quantity would be cumbersome to evaluate repeatedly for larger data sets (> 10000 data points) especially if the probabilistic forecast is relatively-intensive to re-evaluate?
Is this just one of the sacrifices that we make to have "richer" predictions on larger data sets? or are there other well-documented proper scoring rules for continuous data that lend themselves more to computational efficiency? or is this just more likely a user-based difficulty and would require a re-evaluation/optimization of how I'm choosing to approximate the integral/if I'm doing so correctly?