have three classes for sentiment (negative, neutral, and positive). I created synthetic fake data for the positive class the analogy now is 50% neutral, 45% positive, 5% negative. I get the metrics below and I am not 100% sure how to interpret them and if it is good to deploy this to production. I want the model to catch the positives and neutrals and not misclassification on negative class (aka very small False Positive on Negative class I suppose). How would you interpret this table?
0=negative, 1=neutral 2=positive but model trained on one hot encoding.
Classification Report
precision recall f1-score support
0 0.06 0.93 0.10 643
1 0.95 0.16 0.27 36755
2 0.06 0.62 0.11 2309
accuracy 0.20 39707
macro avg 0.35 0.57 0.16 39707
weighted avg 0.88 0.20 0.26 39707
Compared to this
Classification Report
precision recall f1-score support
0 0.40 0.45 0.43 44
1 0.90 0.87 0.88 751
2 0.46 0.52 0.49 123
accuracy 0.80 918
macro avg 0.58 0.61 0.60 918
weighted avg 0.81 0.80 0.81 918