Machine learning, advice on dealing with small datasets + imbalanced classes

Question

I'm hitting a wall on a project I'm working on with the goal of predicting the probability of success for a particular set of data. Right now I'm using Logistic Regression and I'm finding that my framework performs fairly well enough except for specific subsets of data. Without going into too much detail, the probability of success varies based on day of week, and I'm running into instances where, for a particular cohort, Sunday might have a really high rate of success while the rest of the days of the week have low rates of success, leading my algorithm to "learn" that the probability of success for that particular cohort is generally low. I'm looking for advice on which strategies or algorithms might help with this issue (I'm thinking this can be framed as an imbalanced class problem perhaps?). I've tried adding more explicit interaction variables into the dataset, but they don't seem to do much.

See https://stats.stackexchange.com/questions/357466/are-unbalanced-datasets-problematic-and-how-does-oversampling-purport-to-he — kjetil b halvorsen, Apr 19 '22 at 14:16
I'm curious why an interaction between cohort and day-of-week didn't help. Could you perhaps edit the question to show some results with and without that interaction? (Use the code tool {} in the editing toolbar to format output so that those who use text-to-speech can "read" it.) Disguise the variable names as needed to maintain confidentiality. Are you modeling cohorts as fixed or random effects? Please provide that information by editing the question, as comments are easy to overlook and can be deleted. — EdM, Apr 19 '22 at 15:01

Machine learning, advice on dealing with small datasets + imbalanced classes

0 Answers0