Machine learning with ordered labels

Question

The usual method for adapting binary classifiers like various SVMs to multilabel data is one-vs-all, which assumes that labels are independent and in case of a prediction error we don't care what label the incorrect prediction should output.

But suppose I want to predict a score from 1 to 5, and I prefer to err closer to the truth, how do I go about it then? $\lt k$ vs $\ge k$ comes to mind, because in this case labels which are closer together will have larger training data overlap, but is it theoretically sound? Specifically, is overall performance likely to suffer? More generally, what are state-of-the-art techniques I could employ in this situation?

Check out structured SVM. You can find an implementation here. — Marc Claesen, Feb 25 '14 at 17:22
@MarcClaesen I can't see how it relates to what I'm trying to do here. — Aleksei Averchenko, Feb 26 '14 at 10:12
Unless I misunderstood, your class labels are structured, e.g. $5$ implies $\geq 4$ ... $\geq 1$. This can be modeled directly in structured SVM. — Marc Claesen, Feb 26 '14 at 10:30
So you want me to output the whole chain of numbers? But this is just making labels different, it doesn't introduce any dependence between them, how is this going to help me? — Aleksei Averchenko, Feb 26 '14 at 10:43
This could be a good place to start: https://www.cs.waikato.ac.nz/~eibe/pubs/ordinal_tech_report.pdf — Raining fire, Jan 17 '18 at 15:36
Why isn't this just a regression problem? Train a regressor and round to the nearest number in your range? — MotiNK, Aug 22 '18 at 18:25
I agree with @Raining fire, this seems like an ordinal regression problem — William de Vazelhes, Apr 20 '21 at 01:21

score 2 · Answer 1 · answered Apr 02 '23 at 20:32

This question seems to repeat itself every now and then (see here, for example), so I'll just summarize the answers and sources that have been accumulated in the time since this question was asked.

Redefine Objective

Ordinal Categorical Classification
A modified version of the cross-entropy, adjusted to ordinal target. It penalizes the model for predicting a wrong category that is further away from the true category more than for predicting a wrong category that is closer to the true category.

$$l(y,\hat{\ y}) = (1+\omega) * CE(y,\hat{\ y}), \text{ s.t.} $$ $$\omega = \dfrac{|class(\hat{\ y})-class(y)|}{k-1}$$

Where $k$ is the number of possible classes. The operation () the predicted class of an observation that is achieved by arg-maxing the probabilities in a muti-class prediction task. A Keras implementation of the ordinal cross-entropy is available here. An example (taken from the link)

import ordinal_categorical_crossentropy as OCC
model = your model here
model.compile(loss=OCC.loss, optimizer='adam', metrics=['accuracy'])

Treat your problem as a regression problem
Instead of classification task, treat your task as a regression task, and then round your prediction/map them to categories using any kind of method. As Stephan Kolassa mentions, the underlying assumption of this method is that one's scores are interval scaled.
Cumulative Link Loss
Originally proposed here. It is based on the logistic regression model, but with a link function that maps the logits to the cumulative probabilities of being in a given category or lower. A set of ordered thresholds splits this space into the different classes of the problem.The projections are estimated by the model.
Under this method there two loss functions are suggested, probit based and logitbased. A keras implementation of the two versions is available here.
Other methods I won't cover
The following table was taken from here and describe other loss functions for ordinal categorical target.
Treat ordinal categories as multi-label problem
Under this method, we convert our ordinal targets into a matrix such that every class $c$ includes all the first $c$ entries being 1, as the following suggest:

Lowest  -> [1,0,0,0,0]
Low     -> [1,1,0,0,0]
Medium  -> [1,1,1,0,0]
High    -> [1,1,1,1,0]
Highest -> [1,1,1,1,1]

Then the loss function has to be changed as this paper suggest. See this blog-post for an implementation.

Machine learning with ordered labels

1 Answers1

Redefine Objective

Linked