2

Background

I'll confess, I like Richard McElreath's lectures so much that I've been watching his backlog of lectures (even though I've already seen all of the recent lectures of the same course).

In Statistical Rethinking - Lecture 15 of Statistical Rethinking 2015, he says:

Ranks I'm not going to have time to talk about. Rank data is terrible. Absolutely terrible. So there are two things to say about ranks and then I'll never mention them again.

The first is never collect rank data, if you can avoid it. You don't want your primary data to be ranked. What's the problem with ranks? You have to take the whole vector of ranks simultaneously because they're exclusive. Right so if somebody is number one, none of the other things can be number one. No longer can you treat the cases separately. So you've got a bunch of individuals and you had somebody rank them on some scale; man, now you've got to predict the whole vector of ranks simultaneously out of your model. There are model types that do this, but you don't want to go down that road. At least not without me. So the best thing is not to collect rank data.

The second thing is don't transform data that's not ranked into ranks. And there's a tradition of telling people to do this for some reason. And I would like to discourage you from doing that.

So if you find yourself in a situation like this come to me and there are alternatives. If you must deal with rank data there are ways to deal with it, it's annoying but it's doable. And definitely don't transform things into ranks.

Side notes:

  • His claim that ranks are exclusive is not entirely right or complete as it depends on the type of ranking method and the data for whether ranks are exclusive.
  • In later years he espouses trace rank plots, so I don't think he believes ranks are through-and-through bad.

Question

Richard mentions that there are (presumably Bayesian) methods involving predicting an entire random vector of ranks.

To avoid being too general, let's aim to construct an example.

$$\vec X \sim \text{Exponential} \left( \vec \lambda \right)$$

$$\text{ranking} \left( \vec X \right) = \begin{bmatrix} \text{rank}(X_1) \\ \vdots \\ \text{rank}(X_n) \end{bmatrix} \sim \text{Unknown}$$

If we didn't care about the order of the components we could simply take the distribution of ranks $\{1, \ldots, n \}$ to be almost-surely uniform since the exponential distribution is continuous. But here we are concerned with the whole random vector, and some ranking vectors will be more or less common depending on $\vec \lambda$.

I started thinking about the change of variables but decided to make a meme instead.

But seriously, if I define

$$\text{rank}(X_i) \triangleq 1 + \sum_{\substack{1 \leq j \leq n \\ j \neq i}} \mathbb{I}[X_j \leq X_i]$$

then there would be something similar to a derivative or Laplacian of each of the indicators. I'm assuming such an operator would be a derivation that would therefore distribute across the sum. From these distributional derivatives I am imagining that "distributional Jacobian" could be used to work out a change in variables.

But that's just my guesswork. How would I actually work this probability model out?

Galen
  • 8,442
  • I am not sure if I fully grok your question, but there are rank-ordered probit and logit (aka choice-based method of conjoint analysis, the Plackett-Luce model, exploded logit) models. Section 7.3 in Ken Train's Discrete Choice Methods book is a brief intro. – dimitriy Sep 21 '23 at 16:59
  • @dimitriy Thank you for sharing that textbook! It looks like a rich resource. I don't know anything about the Plackett-Luce model or exploded logit (yet). – Galen Sep 21 '23 at 17:24
  • The probit/logit seem appropriate for ordered categories, but not for rankings. This is because an individual rank within a ranking instance is not a measurable function of the outcome space. Maybe $\text{rank}[X_2] = 5$ in one ranking instance, but $\text{rank}[X_2] = 1$ in some other ranking instance. – Galen Sep 21 '23 at 17:27
  • Although not as severe of an issue, ranks within a ranking are not IID. – Galen Sep 21 '23 at 17:29
  • I don't think that is a problem. These models allow for alternative-specific and ranker-specific covariates, which can allow rank for the same item to vary with the characteristics of the ranker. – dimitriy Sep 21 '23 at 17:36
  • @dimitriy A function being non-measurable with respect to the outcome space is definitely a problem. Can you elaborate? – Galen Sep 21 '23 at 17:39
  • I don't see why that would be true in theory or practice. If a function is non-measurable, making meaningful predictions or drawing conclusions from the data becomes challenging. But these models have been a workhorse of demand estimation since the 70s, with many successful applications in economics, with the main challenge being computational issues that have gotten a lot easier recently. For example, I rode a train this morning whose construction was enabled by a survey of transportation mode preferences analyzed by these techniques. – dimitriy Sep 21 '23 at 17:54

1 Answers1

-1

The vector of ranks for $n$ independent draws from any continuous distribution is distributed evenly across the $n!$ permutations of $(1,\ldots,n)$. In particular, this distribution is independent of the exponential parameter $\lambda$ in the example.

Matt F.
  • 4,726
  • Note that this code import numpy as np; from scipy.stats import rankdata; rankdata(np.random.exponential([1,1000]), method='dense') does not set a random seed. And yet I am quietly confident that you will sample the ranking vector array([1, 2]) if you run it. – Galen Sep 21 '23 at 21:21
  • And if you use [1000,1000] instead of [1,1000] then I am much less confident what you will obtain. – Galen Sep 21 '23 at 21:23
  • Correct me if I am wrong, but I suspect that IID is sufficient for the almost-sure uniformity over the possible permutations you are referring to. Independence alone is not enough, as my counterexample shows. – Galen Sep 21 '23 at 21:28
  • 2
    I meant IID by saying “draws from any continuous distribution” (with multiple draws and one distribution and one $\lambda$). I hadn’t noticed the vectorial arrow on top of your parameter — I’d recommend removing the meme and the long introductory quote so that the issues in the question stand out more clearly. – Matt F. Sep 21 '23 at 22:27
  • Thanks for the feedback. Since your attempted answer is off-topic I recommend deleting it. – Galen Sep 22 '23 at 01:03
  • if the post is edited to focus on the issues in these comments, I might have more to say then; for now, this answers one natural reading of the post, so I’ll leave it as is. – Matt F. Sep 24 '23 at 06:40