2

I'm quite new to R and I am trying to do some stats. I have a set of data and want to work out whether my data comes from a normal distribution. I was told to do the test on the residuals of my data, but can't remember if shapiro.test() automatically works out and uses residuals of your data or if you have to do that yourself.

  • 4
    shapiro.test() will "define" the "normality" of your distribution from the vector you use. If you use a vector of residuals then it will define the distribution of residuals. If you use a vector of your raw variable (not from a model) then it will define the distribution of the raw variable. Be very carefull with shapiro.test(), it may help but is very sample size-dependent and may indicate a wrong distribution. You are on the right track looking at residuals distribution but you may want to look at qqplot/boxplot/stability of your residuals – Yacine Hajji Nov 07 '23 at 13:52

1 Answers1

0

No, nor should it.

Residuals only exist once you've defined a model that makes predictions, so you first have to define a such a model, probably a linear regression if you're interested in normal residuals (though there are other possibilities), and then you fit that model to the data.

Once you've done that, you can calculate the model residuals and pass those residuals into the shapiro.test function, though normality testing is less useful than one might hope.

The shapiro.test function does not calculate residuals from a vector of outcomes because the residuals depend on the particular model. You will have different residuals from a simple linear regression than you will from a linear regression that includes multiple predictor variables, and the shapiro.test function is not a function for fitting models.

Dave
  • 62,186