0

I want to check if one of the columns in my dataframe "Activity_Sleep" is normally distributed. According to the histogram it normally distributed:

Minutes_active <- hist(activity_sleep$sedentary_minutes, 
     border="light blue", 
     col="blue", 
     las=1, 
     breaks=5)

RPlot

The Shapiro test however, gives the following results:

Shapiro.test(activity_sleep$sedentary_minutes)
Shapiro-Wilk normality test

data: activity_sleep$sedentary_minutes W = 0.95241, p-value = 5.604e-10 .

According to this result, the p-value is extremely small and the null hypothesis, i.e. that the distribution is normal, needs to be rejected.
Why do I get two contrary results?

  • 2
    Your distribution looks more sharply peaked than a classic normal distribution. Try visualizing it with finer bins. – jdobres Oct 26 '22 at 12:28
  • 6
    A histogram with 8 breaks might be a bad tool to assess for normality. Shapiro test is also bad for assessing normality, too powerful on large sample sizes. Best practice is to check in details your data with QQplot and other tools. – Yacine Hajji Oct 26 '22 at 12:28
  • 4
    SW test is very sensitive to deviations from normality, especially as your sample size increased. You're better off using a qqnorm/qqplot. See https://notstatschat.rbind.io/2019/02/09/what-have-i-got-against-the-shapiro-wilk-test/ – MDEWITT Oct 26 '22 at 12:31
  • Check this https://stats.stackexchange.com/questions/52293/r-qqplot-how-to-see-whether-data-are-normally-distributed?rq=1 – Yacine Hajji Oct 26 '22 at 12:35
  • 1
    What is your sample size? Please tell us (as an edit to the post). Also, redo the histogram with more bins, or better, show us a qq-plot – kjetil b halvorsen Oct 26 '22 at 12:42
  • 1
    @Kjetil You can accurately estimate the sample size by reading the values on the vertical axis. – whuber Oct 26 '22 at 12:48
  • 2
    All of these are good answers, or at least components of an answer ... – Ben Bolker Oct 26 '22 at 12:56
  • @jdobres If the mean of the normal distribution was say about $700$ and the standard deviation about $120$ you might get something like that histogram shape. As you say, finer bins would help – Henry Oct 26 '22 at 13:10
  • 1
    In short, you are comparing a very loose visual test ("does it seem normal to me?") with a very demanding numerical test. – Nick Cox Oct 26 '22 at 14:14
  • Are the sedentary_minutes rounded to the nearest 30 minutes? It looks like it, judging from the histogram. – dipetkov Oct 26 '22 at 14:15

0 Answers0