1

I am currently working on a machine learning model that yields a vector of offloading decisions. An example:

[-1, 0, -1, 1, 1, 0, ...]

The model does not return this vector directly. Instead, the model has 3 output layers, each returning the softmax result for a particular decision. e.g.: each of the below ones has the length of the above vector

[0.12, 0.76, ... ] for -1

[0.33, 0.12, ... ] for 0

[0.55, 0.05, ... ] for 1

For the accuracy computation, I construct the decision vector based on the max value at index i in the above 3 vectors. I then compare two vectors i.e. [-1, 0, 1, 0] with [-1,0,1, -1] (let this one be the actual label) for the accuracy computation. If one of the predicted vector`s entries is wrong, I would count that as wrong. But in the above case they would be very close to each other, which is a 'good' offloading decision. Would it be better to compute the accuracy based on each entry of the predicted vector rather than discarding the prediction altogether when in fact they are close to each other. Are there similar metrics that can help out?

Thanks in advance. With kind regards,

YuKa

YuKa
  • 13

1 Answers1

1

You are finding out why hard classifications evaluated using accuracy are not a good way forward.

Your model outputs probabilistic predictions. This is a good thing. Your evaluation metric should account for the fact that a probabilistic prediction of 0.99 for the true class (and 0.05 for each of the others) is much better than a prediction of 0.4 for the true class (and 0.3 for each of the others). Right now, your setup of transforming your probabilistic classifications into hard predictions by looking at the max completely discards this piece of information.

Use proper scoring rules to assess the quality of your probabilistic predictions. Also, explicitly separate the prediction aspect from the subsequent decision aspect (which may include using thresholds, or looking at maxima).

Stephan Kolassa
  • 123,354
  • Thank you very much for your answer and resources Stephan.

    Regarding the training of the model I would like to add that these hard training labels are transformed into three one hot encoded vectors that are used in the loss computation for the three output layers and the corresponding model optimization.

    After doing further research I came across the HammingLoss which I am currently inspecting as it compares the difference between the two vectors in a more meaningful way than the accuracy I originally described.

    I will take a closer look into your answer.

    With kind regards, YuKa

    – YuKa Dec 04 '22 at 19:35