I am playing around with a classification model, and I would like to know if there are any known methods to achieve what I am looking to do.
The data looks something like this:
| Class | id | $x_1$ | $x_2$ | ... | $x_n$ |
|---|---|---|---|---|---|
| 0 | 1 | ||||
| 0 | 1 | ||||
| 0 | 1 | ||||
| ... | |||||
| 1 | 1 | ||||
| 2 | 1 | ||||
| 3 | 1 | ||||
| 0 | 2 | ||||
| 0 | 2 | ||||
| 0 | 2 | ||||
| ... | |||||
| 1 | 2 | ||||
| 2 | 2 | ||||
| 3 | 2 | ||||
| 0 | 3 | ||||
| 0 | 3 | ||||
| 0 | 3 | ||||
| ... | |||||
| 1 | 3 | ||||
| 2 | 3 | ||||
| 3 | 3 | ||||
| ... |
Where $\{x_1,\dots,x_n\}$ are the features and $id$ is a flag I would like to use, not a "feature".
There are four classes - $\{0,1,2,3\}$, with the 0 class representing a bit under 90% of the data. The thing is, within an ID tag, I know there to exist exactly one occurrence of class 1, class 2, and class 3, with the rest being class 0. I will know this on unclassified data too. The classes within these ID tags are thus not independent(?) in that if one observation is a 1, then no other observation can be a 1.
So my question is, is there a method I can use that allows me to incorporate my knowledge about these ID groups to force the model to assign exactly one observation to group 1, exactly one to group 2, and exactly one to group 3 within each ID flag?
Edit:
I should mention that I have tried undersampling the majority class, oversampling the minority classes, and modifying the loss function to more heavily penalise the misclassification of the minority classes. These are ideas I have seen used in fraud detection. But in the example of fraud detection, you don't know how many fraudulent transactions there are in a group, ahead of time.
My question is focused on if there is a way to incorporate my pre-existing knowledge of the count of each class (inside an id flag) into my prediction. Could I perhaps customise the loss function to heavily penalise classifying the incorrect number into each class?
Edit (2): To provide more context, the id flag represents a game, and all observations with the same id flag are players in that game. The features are the 'stats' for each player in that particular game. After a game is played, a player is awarded 3 points if they are the best player, 2 points if the second best player, and 1 point if they are the third best player. All other players are awarded 0 points. This is the definition of a class.
Edit (3): The classes are some sort of transformation of outcomes in the game. For example in a soccer match, if a player kicks more goals and touches the ball more times than a second player, the first player has most likely played better and will most likely be ranked higher. Also, I don't need to predict those game outcomes, they are known at the time of ranking, which occurs after the game has been played. I would like to predict what the ranking will be, given the outcomes from the game.