If you want to have a quality measure analogous to Winkler's interval score you mentioned, then this measure is probably very complex, or might not exist at all.
Forecast specification and the term 'prediction interval'
Firstly, I would argue that the term 'prediction interval' can be misleading in this setting. If we want to report an interval instead of a single value or a predictive distribution, then we have to specify what statistical property that interval represents. Simply saying that we want interval forecasts is like saying we want point forecasts, but not specifying whether they represent the mean, a quantile, or something else. We cannot do proper statistical forecast evaluation without this information.
A prediction interval is usually understood to be an interval in which a future observation will fall with a specified probability. However, predictions in an interval format could be specified in lots of ways, e.g. the values between the mean and the mode or between the 30%- and 90%-quantile of the conditional predictive density might be of interest. Naming all predictions of these quantities 'prediction intervals' could lead to confusion.
Winkler's interval score
If we have specified the type of interval forecasts, then we need to find some metrics for their evaluation. If our aim is to compute the expected loss of a collection of interval forecasts, then we should choose the scoring/loss function such that it is consistent for the type of interval which was predicted, i.e. the true interval should minimize the loss function in expectation. This is analogous to consistency of the squared error for the mean: expected squared error is minimized by the distribution mean only. Hence, if you don't have mean forecasts, you should not use squared error. The idea of consistent loss functions can be seen as an equivalent of proper scoring rules for point forecasts.
For $\alpha > 0$ the scoring/loss function
$$
L([\ell,u] , y) = (u-\ell) + \frac{2}{\alpha}(\ell-y)1(y<\ell) + \frac{2}{\alpha}(y-u)1(y>u).
$$
is often called Winkler's interval score. It compares the interval forecast $[\ell, u]$ to the observation $y$ and it is consistent for the interval defined by the $\frac{\alpha}{2}$- and $(1- \frac{\alpha}{2})$-quantile. Consequently, it does not make sense to use $L$ for the evaluation of some sort of prediction interval (in the sense mentioned above) which does not meet this definition. Also, even if you are in a setting where the predicted highest density region is always an interval, evaluating it with Winkler's interval score is meaningless. It is like using mean squared error to evaluate quantile forecasts.
Loss functions for the highest density region
Let's assume for a moment that we are in a situation where the predictive highest density (p.h.d.) region is always an interval. A reasonable approach would then be to find a loss function which is consistent for this p.h.d. interval and use expected losses for evaluation. Unfortunately, there is a paper (see below) which shows that under some regularity assumptions, there are no loss functions which are consistent for the p.h.d. interval (which is called 'shortest interval' therein.)
Now if we cannot find a consistent loss function in this simplified setting, where the p.h.d. region is always an interval, then I doubt that it is possible to find one for the more general case. And if evaluation cannot be done via a loss function, then this suggests that the suitable quality measures for p.h.d. regions are rather complicated.
References: