0

there are 13 variables which have been regressed using the logit. All the variables are categorical variables. Now my aim is to see the interaction between two variables. When I run my logit, it runs out of degrees of freedom and hence the STATA does not deliver any reslut.

Example

Q= i.A i.S i.Ec i.Pr i.xx i.dd#i.ii

Can I treat my 11 variables as continuous and add the interaction term. If yes, how do I support my doing

Example Q = A S Ec Pr xx i.dd#i.ii

  • 1
    "Q= i.A i.S i.Ec i.Pr i.xx i.dd#i.ii" -- what is this? Did a cat walk on the keyboard? Is it a sequence of keystrokes for something? I don't understand. – Glen_b Aug 24 '14 at 02:14
  • Sorry this is the stata command i.e Q being the dependent variable and A S EC Pr XX dd ii the independent.

    i.dd#i.ii is the interaction variables. # menas interaction in stata @Glen_b

    – user40767 Aug 24 '14 at 02:17
  • 1
    Your question is probably better phrased initially as a general one about models, using a more typical notation to represent them, and then at the end making some reference to doing it in Stata. You should probably clarify how many levels you have in each variable, how many observations you have and which of your variables are at least ordinal. [Or is it more a pure "what Stata commands do I use?" kind of question?] – Glen_b Aug 24 '14 at 02:22
  • That was just a way to show an example, hence looks bit unclear.

    My issue is :- When I run my logit regression, do I treat my categorical variables as continuous as when I try to do logit using all the categorical variables and the interaction term, STATA does not deliver a result. It starts the iteration process.

    Therefore, should I treat my other 11 variables as continuous variables and then do the interactions. I have around min 4 levels for each variable and all of them are categorical. @Glen_b

    – user40767 Aug 24 '14 at 02:29
  • Unless I misunderstand things here (which is possible), you just can't estimate that many parameters with your data. And no, it really isn't right to treat those categorical variables as continuous. My suggestion is to add each two-way interaction individually to the model. Maybe (if you're lucky) you'll pick up just a couple that make a difference. This is a bit like forward model selection in regression. – Russ Lenth Aug 24 '14 at 02:43
  • @RussLenth What you have said is exactly my issue of concern. Initially I was trying to regress just the interaction terms individually to the model. What held me is that just by regressing the interaction terms wont give me the proper effect? Unless like you said - it can be a forward model selection. – user40767 Aug 24 '14 at 02:50
  • Just to clarify... Try models with ALL the linear terms and ONE of the two-way interactions. Something in what you say makes me think that's not what you've been doing. – Russ Lenth Aug 24 '14 at 02:55
  • @RussLenth I was trying ways to get an answer, this was one of them . Where I treated 11 of my variables as continuous and then did interaction with the other two. That why the question whether I could treat my categorical as continuous. – user40767 Aug 24 '14 at 02:58
  • I think we're still not on the same page. I don't think it's OK to treat those categorical predictors as continuous, unless they are at least ordinal, like a Likert scale. Even then, it's questionable. How many categories are in each, and how many observations do you have in the dataset? – Russ Lenth Aug 24 '14 at 03:17
  • The number of observations are 10000 and the number of variables are 13. All are categorical with minimum of 4 levels.

    In my initial step I performed logit with the categorical variables, then I wanted to see the interaction between two variables. As its not possible to regress the entire chuck, i took two significant variables whoch i found after logit. I performed logit on them again but interacted the two variables( treated them as categorical). Is this what you meant by forward model selection? @RussLenth

    – user40767 Aug 24 '14 at 03:29

1 Answers1

2

I don't use Stata, so I'm not quite sure what's going on here, but I think it's getting ugly fast. Assuming you have 13 categorical variables with 4 levels each, the basic logistic has to fit an intercept and 3*13 parameters (since each category becomes 3 dummy variables). An interaction term involving 2 categories requires 9 degrees of freedom, and you have 78 possible pairs, for 702 degrees of freedom. If Stata is trying to calculate interactions for everything, I can see why you are running out of degrees of freedom. So you don't want to go there.

If the categories are ordinal and in some sense equally spaced, you could replace them with continuous variables ... holding you nose. Perhaps a better approach would be a decision tree model, which would deal more gracefully with relationships between the categories.

Placidia
  • 14,361
  • I agree, and wonder if @user40767's expression for Q generates a 13-way interaction, or just the 2-ways. If the latter, I'd think Stata could handle it, given the abundance of memory and speed in today's PCs. – Russ Lenth Aug 24 '14 at 20:23