I have created logistic regression model with my X_variable being income and y variable being a binary variable of whether the business accepts card or cash (link to data at the end), I used sklearn python library to do so and these are the charts I got:
However the issue is that while the chart on the right makes sense, with the chart on the left I would expect the line to have negative coefficient as the scatter plot suggest, the y values of value 0 are in the region where the net annual income is higher. Furthermore, I ran the regression using statsmodel library (very similar to R programming language) where I got a full summary and there the coefficient for cash is negative while for card positive (as I would expect it to be)
The data that I used: https://pastebin.com/ShPxuqmL
Example code:
import matplotlib as plt
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression()
clf.fit(X, y)
plt.scatter(X_train, y_train)
x_range = np.linspace(X.min(), X.max(), 100)
y_range = clf.predict_proba(X.reshape(-1,1))[:,1]
plt.plot(X, y, color='red',label='logistic line')


sklearndocumentation. Perhaps this is why thesklearnandstatsmodelsoutputs are different. See: https://stats.stackexchange.com/questions/203740/logistic-regression-scikit-learn-vs-statsmodels/457606#457606 – Sycorax Feb 26 '23 at 17:00