2

I'm sorry if this question sounds trivial or is a duplicate, but I could't find a definition formula for the $c$-index. There are numerous questions and answers about it, but all I have skimmed assume that the reader knows how to compute it or has the software for that purpose.

I've read that for binary outcome it is equivalent to the area under the ROC curve. But I have an ordinal model, with more than two levels.

So, can someone please post, in a simple and concise answer, the formula defining the $c$-index, as used in ordinal regression with uncensored data?

P.S. I would also appreciate if someone could point me to a Python library which can compute the index and which works well with statsmodels.

Igor F.
  • 9,089
  • I guess this is https://www.statsmodels.org/dev/generated/statsmodels.miscmodels.ordinal_model.OrderedResults.resid_prob.html or closely related to it (I'm not familiar with c-index terminology) – Josef Dec 21 '22 at 14:22
  • I came to this through the literature for Brunner-Munzel test and Wilcoxon-Mann-Whitney statistic and I am not familiar with link to AUC or similar in ordinal case. https://www.statsmodels.org/dev/examples/notebooks/generated/stats_rankcompare.html – Josef Dec 21 '22 at 14:38
  • Thanks, Josef, but I don't see a connection. resid_prob returns a vector of values, one per observation. The $c$-index is a single number. – Igor F. Dec 22 '22 at 15:10

2 Answers2

4

Frank Harrell explains on this page. With outcomes $Y$ and paired predictor values $X$, the C-index is the fraction of comparable pairs of ($X,Y$) values in which the predicted (from $X$) and observed $Y$-value order is correct.

With Harrell's rms package, you back-calculate the C-index from the reported discrimination index Somers' $D_{xy}$, "the difference between the number of concordant and discordant pairs [in ordering of $Y$ and $X$], divided by the number of pairs with $X$ values in the pair being unequal." Wikipedia. The relationship between these discrimination measures is $D_{xy} = 2\times (C - \frac{1}{2})$.

Absent censoring, the comparable pairs are those without ties. With right censoring of outcomes as in survival models, you can include an ($X,Y$) pair in such comparisons up to the point at which $Y$ is right censored.

Software-specific questions are off-topic on this site and I don't use Python, but if there's a function for Somers' D somewhere in the Python universe you are all set.

EdM
  • 92,183
  • 10
  • 92
  • 267
1

As already noted above, $c$-index is related to Somers' $D$: $$ c = \frac{D_{XY} + 1}{2} $$ but that requires us to be able to compute $D_{XY}$. This is not quite trivial. In a comment below the linked answer, Frank Harrell explains how it's done in R's rms library:

Internal, all possible combinations of observations having different $Y$ values are examined, and the fraction of such pairs for which predictions are in the same order is the estimate of the concordance probability.

In the Python's scipy.stats module there is a function somersd to compute $D_{XY}$. One should pay attention, however, that $D$ is not symmetric: in general, $D_{XY} \neq D_{YX}$, so the order of the parameters is important. From the documentation:

somersd(x, y) calculates Somers' $D(Y|X)$

Igor F.
  • 9,089