3

Conceptual Question: Suppose a banker wants to run regression on S&P 500 companies to see if they make more returns than the last year on average daily return basis. He runs on dummy variables for each company. Suppose the model is $R_i=\beta_0+\beta_1 k_{1i}+...+\beta_{500} k_{500i}$ where $R_i$ is returns on a daily basis and there are 500 companies.This is not a good regression because it falls into dummy variable trap, and how to fix it?

Definition of dummy variable trap: a situation in which two or more independent variables are in perfect linear correlation. Suppose dummy variables are used for gender. If there are 2 categories in gender, only one dummy variable should be use otherwise using 2 dummy vairalbes for 2 categories will lead to dummy variable trap.

So to fix this, drop one dummy variable, i.e.$R_i=\beta_0+\beta_1 k_{1i}+...+(\beta_{500}-\beta_{499}) k_{499i}$?

Please let me know what information to add, this is all I can think about for now. Thanks in advance.

shine
  • 133
  • 2
    What model are we talking about here? What are the dependent and independent variables? – Richard Hardy Oct 02 '22 at 06:52
  • @RichardHardy I think the model is about linear regression with multiple regressors. The dependent variable is overall returns per day and independent variables are k companies. – shine Oct 03 '22 at 02:01
  • Still not getting it... What is $k_{ji}$? Is $i$ indexing time? If so, $t$ would be the more common notation. – Richard Hardy Oct 03 '22 at 09:17
  • I'm pretty sure the answer by @Tim is the answer sought by the original, quoted question. Perhaps adding a "self-study" tag will clear up the confusion. – Sal Mangiafico Oct 03 '22 at 12:53

1 Answers1

7

It is. There are $k$ companies, if you use $k$ dummy variables it’d suffer from perfect multicollinearity, you need to use $k-1$ variables instead.

Tim
  • 138,066