I'm hitting a wall on a project I'm working on with the goal of predicting the probability of success for a particular set of data. Right now I'm using Logistic Regression and I'm finding that my framework performs fairly well enough except for specific subsets of data. Without going into too much detail, the probability of success varies based on day of week, and I'm running into instances where, for a particular cohort, Sunday might have a really high rate of success while the rest of the days of the week have low rates of success, leading my algorithm to "learn" that the probability of success for that particular cohort is generally low. I'm looking for advice on which strategies or algorithms might help with this issue (I'm thinking this can be framed as an imbalanced class problem perhaps?). I've tried adding more explicit interaction variables into the dataset, but they don't seem to do much.
Asked
Active
Viewed 83 times
{}in the editing toolbar to format output so that those who use text-to-speech can "read" it.) Disguise the variable names as needed to maintain confidentiality. Are you modeling cohorts as fixed or random effects? Please provide that information by editing the question, as comments are easy to overlook and can be deleted. – EdM Apr 19 '22 at 15:01