Let's say we have basic MNIST dataset, and we have the same goal to predict the digit, BUT we're swapping all the labels by RED and BLUE. The label becomes BLUE with probability DIGIT/10. So if the digit is FIVE, then probability of label BLUE is 50%, so we have equal number of RED and BLUE labels. And if that digit is TWO, then probability of BLUE is only 20% and we have mostly RED.
This on one hand seems like an extremely trivial task, but on the other hand I can't see any way of approaching it, despite many years in the fields of ML and programming.
My question: what architecture or methodology could be used to approach this kind of a model? I understand that traditional feed-forward neural networks won't be able to deal with probabilistic nature of this problem. While Probabilistic Programming tools aren't suitable either. I would appreciate links to similar problems solved or any proposals on how can this be conceptually approached. Right now it doesn't seem that traditional neural network will do any good on this problem.

red, 10% it'sblue). We're not predicting the training label, but the probability. (Well that's my idea anyways). – avloss May 21 '22 at 15:33First, predict the digit-the idea is that model doesn't see the labels. If the task is broken up in two, then each of them separately more-or-less trivial, but require completely different toolset. We assume model doesn't know how many labels there are. It also should ideally be resilient to unbalanced classes, and have some notion of confident. (I know I'm asking for too much, but I've been thinking about this for a while now). – avloss May 21 '22 at 15:37