0

I tried to create a logistic regression model and then plot it, the first try didn't really work out and I got something like this:

enter image description here

However then I realized that my data is imbalanced (around 85% are 1s a the rest are 0s), so I changed my model to have a weighted logistic regression in Python using sklearn:

clf = LogisticRegression(class_weight='balanced')

for some of the data this worked perfectly, as seen in the Average age chart, however, some are still not 'curvy' and are missing the sigmoid look (in terms of Work from home, negative sigmoid look).

enter image description here

Furthermore, when I test my model, for some I get a pretty good accuracy of 80%+, but some are between 40-50 and just by looking at the visualisation it impossible to understand what is going on.

enter image description here

Sam333
  • 123
  • 9
    There's no problem here. Composing a sigmoid function $\sigma(x)$ with a linear function $ax + b$ gives $\sigma(ax+b)$. But when $a$ is close to 0, the function will appear "flattened" when plotted, compared to plots of $\sigma(x)$. In the case of $a = 0$, the function will just be the constant $\sigma(b)$. Other aspects of these plots are just about the range for the axes; for instance, extending the axis of "Average Age" to go from [0, 55] (or similar) would show a typical sigmoid. (Of course, there are hazards to extrapolating beyond the range of the data.) – Sycorax Jan 23 '23 at 17:59
  • 3
    Compare the simpler problem of fitting a straight line. $y = a + bx$ is fitting a line that is defined for any finite $x$ but naturally you are usually interested only in the range of the data or a bit more, so you see only a fraction of the line. So, also the logit: you see only the range of the data or a bit more. Now any section of a straight line must itself be straight, but a little thought shows that any section of a logit cannot be a logit. Locally it is straight! – Nick Cox Jan 23 '23 at 18:05
  • 4
    Also, unbalanced data is no reason to use weighting in logistic regression, which is explicitly designed to work with probabilities, whether low or high: https://stats.stackexchange.com/q/544839/1352 – Stephan Kolassa Jan 23 '23 at 18:12
  • thank you all for your response, my follow-up questions would be, since my a is close to 0 (and that's why the function appears flattened) does it mean I am not using the right model or that my input features are not highly correlated with my output features? Because on the other hand my accuracy, precision and f1 scores are quite high (with some). With logistic regression, are there any tests like p-values or person correlation coefficient, similar to what linear regression has? Thanks a lot! – Sam333 Jan 23 '23 at 22:57
  • @StephanKolassa alright that makes sense, one thing I dont understand however is if I should look at logistic regression from the point of probability, let's say I have datapoints as depicted here: link It is clear that the 1s observation occur predominantly more than 0s after x > 0.06, so shouldn't it have this curvy top and be more steep since there are no observation s after 0.06 that would keep this line so 'flat' ? Thanks ! – Sam333 Jan 23 '23 at 23:07
  • 1
    The logistic curve is not arbitrarily flexible. If it changes over one part of the predictor, it has to change in another part, too. In your example, there are also lots of 1s over low values of the predictor, and the logistic regression needs to fit these, too. It thus finds a balance between fitting over low and high values of the predictor. If you believe you need more flexibility, you could use a spline transformation of the predictor, but note that this can lead to nonsensical models, e.g., non-monotonous relationships. – Stephan Kolassa Jan 24 '23 at 07:02
  • A further if standard point: you appear to be fitting several single-predictor models, and it's likely that you need to proceed to a multiple-predictor model. – Nick Cox Jan 24 '23 at 12:42
  • @StephanKolassa 's point is correct, but one further wrinkle is that if you need the flexibility of splines on the one hand, but also to enforce a monotonic or shape constraint on the other, then there are specialized methods to fit these splines. – Sycorax Jan 24 '23 at 16:25
  • @Sycorax: can you give any pointers? That's a topic a colleague is wrestling with right now. – Stephan Kolassa Jan 24 '23 at 16:26
  • 1
    There's not an enormous amount of literature on this but a simple Google Scholar search turns up some promising results like Mukerjee, Hari. “Monotone Nonparametric Regression.” The Annals of Statistics, vol. 16, no. 2, 1988, pp. 741–50. JSTOR, http://www.jstor.org/stable/2241753 and Mary C. Meyer. "Inference using shape-restricted regression splines." Ann. Appl. Stat. 2 (3) 1013 - 1033, September 2008. https://doi.org/10.1214/08-AOAS167 and "Shape-Restricted Regression Splines with R Package splines2." – Sycorax Jan 24 '23 at 16:29
  • 1
    @StephanKolassa We also have this question https://stats.stackexchange.com/questions/467126/monotonic-splines-in-python – Sycorax Jan 24 '23 at 16:33
  • 1
    @Sycorax: thanks, that may be helpful... I know I struggled with this a couple of years back and didn't find anything particularly useful. – Stephan Kolassa Jan 24 '23 at 16:34

0 Answers0