0

I have the following dataframe in Python:

import pandas as pd
df = pd.DataFrame({'id': [1,2,3,4,5,6,7,8,9,10],
                   'group': ['a', 'a', 'a', 'b','b', 'b','b','c','c','c'],
                   'val':[1,0,1,0,1,1,0,0,1,1]})

print(df) id group val 0 1 a 1 1 2 a 0 2 3 a 1 3 4 b 0 4 5 b 1 5 6 b 1 6 7 b 0 7 8 c 0 8 9 c 1 9 10 c 1

I want to see if ANY of the levels of group has a significant effect on val.

So I do a logistic regression as follows:

import statsmodels.api as sm

dummy_variables = pd.get_dummies(df['group'], drop_first=False)

logit_model = sm.Logit(df['val'], dummy_variables) result = logit_model.fit()

print(result.summary())

Notice the drop_first=False and I also do not add a constant in the model, due to multicollinearity.

At my real data (not the dummy example) i get a p-value < 0.05 for dummy_a and a p-value > 0.05 for dummy_b and dummy_c.

Is it right to say that a has a signifficant effect on val ?

User1865345
  • 8,202
quant
  • 511
  • 5
  • 12
  • 1
    Many similar questions here: https://stats.stackexchange.com/questions/424633/overall-significance-of-a-categorical-variables-in-logistic-regression, https://stats.stackexchange.com/questions/24298/can-i-ignore-coefficients-for-non-significant-levels-of-factors-in-a-linear-mode, https://stats.stackexchange.com/questions/71832/what-do-p-values-for-levels-of-a-categorical-variable-represent-in-poisson-regre – kjetil b halvorsen Nov 14 '23 at 13:12

0 Answers0