0

I am conducting a multiple linear regression (OLS) and would like to test the assumptions related to these regressions. My OLS regression involves one dependent variable and 3 independent variables. All my VIF scores are below 2 so I believe that means that there is no multicollinearity. For the other assumptions I have these pictures below. However I have trouble interpretating them.

  1. I believe that the pp-plot shows violation of linearity? If so, does it mean that I cannot interpret the results of the regression?

  2. Also, what does the scatterplot tell me in regards to the assumptions for OLS?

Thank you very much for your insights. enter image description here enter image description here

Sabine
  • 1
  • 4
    Neither plot tells you anything about linearity. The basic problem is that your response look like it is related to a small count--it is discrete, bounded on one side, and skewed--and would therefore benefit from applying a different model, such as a logistic regression. – whuber Aug 20 '23 at 14:31
  • 1
    The PP plot tells you that your residuals are pretty close to normal. For the first plot, see @whuber 's comment.

    I also noticed that the DV is "Intention" . How is that measured? From the plot, it looks like it is measured ordinally, probably on a Likert scale. And, as whuber says it is skewed. This plot ought to look blobby.

    – Peter Flom Aug 20 '23 at 14:52
  • @whuber Thank you. – Sabine Aug 20 '23 at 14:57
  • @PeterFlom yes my dependent variable was measured on a Likert scale. Could you tell me whether any of these pictures show violation of assumptions for OLS? Thank you – Sabine Aug 20 '23 at 14:59

1 Answers1

0

Welcome to CV.

In your comment you say that your DV is ordinal and ask if any of the pictures show violation of OLS.

When your DV is ordinal, you don't need pictures. OLS assumes that the DV is continuous.

As Whuber noted, your first plot does show this, but interpreting that plot requires experience and expertise. I also noted that that plot should be blobby. That is, it shouldn't have patterns.

But never mind that. Your DV is ordinal. Maybe a 7 point scale. OLS will make nonsensical predictions. None of the predicted values will be integers (although they might be close) and some might be below 1 or above 7.

You should use ordinal logistic regression, at least as a starting point.

Peter Flom
  • 119,535
  • 36
  • 175
  • 383
  • Thank you, I am very new to Statistics so your comments are helping me a lot. Maybe I should clarify that my DV was formed based on 3 questions, each measured on a likert scale (1-5). I then added and averaged out these 3 questions to construct my DV. Does that change anything to your recommendations to conduct a logistic regression? (My professor advised to conduct ols, therefore I am confused). Again, thank you very much – Sabine Aug 20 '23 at 15:16
  • @Sabine I think you have three DVs, one for each question. I suggest using a structural equation model which can be combined with ordinal logistic regression. I find that this tends to work no-worse, and sometimes better, than doing three separate ordinal logistic regression models. – Galen Aug 20 '23 at 15:20
  • 1
    @Sabine For a weird flex, you could treat the three question responses as a single DV but with mixed effects ordinal logistic regression where there are random effects for the question. – Galen Aug 20 '23 at 15:25
  • 1
    I am unaware of any authority that asserts the response variable must be continuous. Maybe you meant to use a different term than "continuous"? – whuber Aug 20 '23 at 15:26
  • Yes, that does change things. If you average three five point scales, you can get any of 15 answers. But I am betting you don't get all 15 different possibilities.

    I'm kind of torn here. I think the right solution to your problem is probably complex. Galen's idea of an SEM is fine. I would have suggested an exploratory factor analysis. EFA would give you more different answers and get rid of the problems I stated in my answer.

    But ... Maybe you've never even heard of those.

    Your first plot, though, does show violations.

    – Peter Flom Aug 20 '23 at 15:30
  • @whuber Maybe "assumption" was the wrong term. But see e.g. Ben's answer on this thread https://stats.stackexchange.com/questions/143167/ols-with-ordinal-dependent-variable-do-the-coefficients-mean-anything – Peter Flom Aug 20 '23 at 15:34
  • I think "continuous" was not the term you intended. The underlying issue is of the measurement level of the response, which is separate from applicability of OLS. Moreover, there is at best a loose and not very direct link between "continuity" (whatever that might mean--it's an overloaded term in this context) and measurement level. – whuber Aug 20 '23 at 15:37
  • 2
    @PeterFlom The whole level of measurement thing has to do with enforcing symmetries of the actions of certain algebraic structures (usually, and classically, groups). If you don't care about those symmetries for a given analysis, then it is optional whether you limit your operators to respect that structure. The success of word embedding techniques (some of them using least squares in the parameter estimation) in natural language processing show strong counterexamples to the notion that one must respect what seems like the prima facie 'natural' structure of the data. – Galen Aug 20 '23 at 15:42
  • 1
    @Sabine All that disagreement notwithstanding, if you are doing a frequentist analysis I suggest lavaan for R and semopy for Python. Similarly, if you do a Bayesian analysis, you can use RStan for R and PyStan for Python. – Galen Aug 20 '23 at 15:56
  • @Galen OK, I even wrote a blog post about the problems with Stevens' categories, especially when they get used as a straitjacket instead of as guidelines. But, in trying to answer Sabine's question .... Well, as I said earlier, I'm torn. She is just starting! And her professor seems to be giving bad advice (I'm betting that the prof is not a statistician). – Peter Flom Aug 20 '23 at 16:02
  • 1
    @PeterFlom TBH I miss your blog. But yes, I think we agree that something using ordinal regression alongside other techniques (e.g. SEM) would be useful to Sabine. – Galen Aug 20 '23 at 16:06