Linear Regression: Consequences of Violating Assumptions?

Question

I've been reading more about linear regression and the assumptions it makes. What are the consequences of violating some of the assumptions involved?

For instance, you need the dataset to exhibit little or no multicollinearity. What happens if I ignore this?

For instance, suppose I have a dataset where x_3 and x_4 are just duplicates of each other (and thus perfectly correlated). The only difference I can see is that, instead of having one line of best fit, I can now have infinite:

E.g. if the line of best fit without x_4 had Bx_3 in it, then I can just distribute B amongst x_3 and x_4. So if B = 5, I could do 4x_3 + x_4 or 3.5x_3 + 1.5x_4, etc.

So, it seems like:

1) Maybe it's not disastrous to violate the multicollinearity assumption?

2) Is it much worse to violate the others / if so, what are the consequences of doing so?

Thanks!

"What is a complete list of the usual assumptions for linear regression?" could be useful. — Richard Hardy, Feb 01 '18 at 15:28
@MatthewDrury, I had a similar idea, but multicollinearity is not discussed in that much detail there. Nevertheless, this must be a duplicate of some older thread. — Richard Hardy, Feb 01 '18 at 15:29
Fair, I added an answer to address that one point. Im not sure I can recall a question that asked for details of exactly that... — Matthew Drury, Feb 01 '18 at 15:46
Neither multicollinearity nor lack of it is an assumption of linear regression. — whuber, Feb 01 '18 at 15:53

score 2 · Answer 1 · answered Feb 01 '18 at 15:44

First, good on you for thinking critically about this. I ofen find these assumptions are held as some sacred set of "rules for linear regression" and regurgitated without pause. It's much more important to think though what they are there for and when they are needed.

Your second quesiton is answered in detail here in mpiktas's answer, so I'll just comment on your first.

Your arithmetic is correct. When there is a strict lienar dependency in your data, the consequence is that your parameter estimates are not determined uniquely for the from the data. For some situations, say prediction, this is not so concerning. You're going to make the same prediction no matter what choice you make in how you distribute the coefficients, so you may as well not worry about it too much.

On the other hand, there are other applications of linear regression. Sometimes we have a scientific theory about how certain concepts or actions relate, and we want to estimate the strength of the associations. In this case we can collect data on our phenomena, and then fit a regression. We can take our estimated coefficients as giving us some information about the strength of association between different components in our system. If you have strict linear dependency, this doesn't work, because nature must have made an assignment, and our model does not have the power mathematically to distinguish between any of a number of possibilities.

Linear Regression: Consequences of Violating Assumptions?

1 Answers1