How to address bias in AI Image Recognition Model: Oversampling, Undersampling, and Ensemble Techniques Not Working

Question

I am currently working on an image recognition project using AI, but I am facing challenges with bias in my model's predictions. The model seems to be biased toward the majority classes in my dataset. Despite attempting oversampling, undersampling, and ensemble learning techniques, I have not been able to resolve this issue effectively.

My dataset contains images of various categories, including food, drinks, and desserts. However, the AI model struggles to distinguish between these categories accurately.

Here's a summary of what I have tried so far:

Data Preparation: I've loaded my dataset from a CSV file and removed irrelevant columns. I ensured that labels are correctly assigned to each image.

Model Architecture: I've created a Convolutional Neural Network (CNN) with layers for feature extraction and classification. I've experimented with different architectures and hyperparameters.

Training Process: I've adjusted hyperparameters like learning rate and batch size and trained the model for an increased number of epochs.

Handling Class Imbalance: I've tried random oversampling, undersampling and also ensemble learning to address class imbalance, but the model still favors majority classes.

Data Preprocessing: I've resized and normalized images, ensuring consistency between training and inference.

Evaluation Metrics: I'm using precision, recall, and F1-score as metrics to assess the model's performance, especially for imbalanced categories.

Despite these efforts, my model struggles to differentiate between categories accurately, primarily due to bias towards majority classes. I'm seeking advice on how to improve accuracy and address this bias effectively. Your suggestions are welcome.

Welcome to Cross Validated! How much data do you have, and how large of a neural network are you using? What Stephan Kolassa wrote is correct in theory and absolutely worth knowing, but your situation sounds like one where the categories are rather easy for a human to distinguish (burger looks totally different from a cookie looks totally different from a martini), so I wonder if there is another issue at play that is going to harm your performance on proper scoring rules of a probability model, too. — Dave, Sep 18 '23 at 10:21

score 5 · Answer 1 · answered Sep 18 '23 at 06:11

5

Class imbalance is usually not a problem, except for higher parameter uncertainty.

Your problem is very likely that optimizing accuracy inherently biases predictions towards majority classes. All other metrics that rely on "hard" 0-1 classifications suffer from the same issue and will not reward unbiased predictions.

After all, how could they? Whenever you have multiple instances with the same predictor information, but different outcomes, any 0-1 prediction must be biased! Thus, what you apparently are looking for are probabilistic predictions that are calibrated. Thus, create an appropriate probabilistic model and tune it using proper scoring rules. You can later still compare your probabilistic classifications to thresholds that are based on the costs of actions if you need to make discrete decisions.

answered Sep 18 '23 at 06:11

Stephan Kolassa

123,354

2

The original poster is using neural networks (CNNs). Can you expand on concretely how to "create an appropriate probabilistic model and tune it using proper scoring rules" in the context of neural networks and CNNs? Creating well-calibrated neural network models is notoriously challenging. – D.W. Sep 18 '23 at 16:25
3

When people train neural networks, they normally use the cross-entropy loss, and my impression is that the cross-entropy loss is a proper scoring rule. So I'm not sure what more one can do. Am I misunderstanding something? (See also https://stats.stackexchange.com/q/532813/2921.) – D.W. Sep 18 '23 at 16:25
3

@D.W.: that is quite correct, and it is proper. It may well be an issue of how probabilities are turned into hard classifications, either in some explicit post-processing, or in the final layer, or somewhere in the middle of the network. For instance, every single instance may have a predicted probability of 0.6 to 0.8 of being of the target class, and if this is compared to a threshold of 0.5, then every single instance will be hard-predicted as 1, and we have our bias, because some will be of the minority class. Again, my recommendation would be to explicitly predict probabilistically. – Stephan Kolassa Sep 18 '23 at 16:31

How to address bias in AI Image Recognition Model: Oversampling, Undersampling, and Ensemble Techniques Not Working

1 Answers1