Logistic regression with only 1 dummy variable

Question

I have the following dataframe in Python:

import pandas as pd
df = pd.DataFrame({'id': [1,2,3,4,5,6,7,8,9,10],
                   'group': ['a', 'a', 'a', 'b','b', 'b','b','c','c','c'],
                   'val':[1,0,1,0,1,1,0,0,1,1]})
print(df)
id group  val
0   1     a    1
1   2     a    0
2   3     a    1
3   4     b    0
4   5     b    1
5   6     b    1
6   7     b    0
7   8     c    0
8   9     c    1
9  10     c    1

I want to see if ANY of the levels of group has a significant effect on val.

So I do a logistic regression as follows:

import statsmodels.api as sm
dummy_variables = pd.get_dummies(df['group'], drop_first=False)
logit_model = sm.Logit(df['val'], dummy_variables)
result = logit_model.fit()
print(result.summary())

Notice the drop_first=False and I also do not add a constant in the model, due to multicollinearity.

At my real data (not the dummy example) i get a p-value < 0.05 for dummy_a and a p-value > 0.05 for dummy_b and dummy_c.

Is it right to say that a has a signifficant effect on val ?

Many similar questions here: https://stats.stackexchange.com/questions/424633/overall-significance-of-a-categorical-variables-in-logistic-regression, https://stats.stackexchange.com/questions/24298/can-i-ignore-coefficients-for-non-significant-levels-of-factors-in-a-linear-mode, https://stats.stackexchange.com/questions/71832/what-do-p-values-for-levels-of-a-categorical-variable-represent-in-poisson-regre — kjetil b halvorsen, Nov 14 '23 at 13:12

Logistic regression with only 1 dummy variable

0 Answers0