I plot Score (y variable) against Year (x variable) and I want to be able to show that there is an increase in the score as the year increases. The scatter plot indeed indicates this but I would like to get a p-value for this by way of an appropriate hypothesis test on this slope. I believe that a linear regression t Test is not valid as my data Year is not normally distributed (it being discrete and uniform).
I thought to then create a simulation and I want to know if what I have done is a valid technique.
- I find the linear regression slope of the actual data, which I called slopeActual.
- I did a simulation (say 1000 times) whereby at each loop I permuted the y values (Score) and calculated a regression slope. These values I stored in a list I called slopeList.
- I calculate: p-value = P(slopeActual>0 | there is no association) = proportion of values in slopeList greater than slopeActual.
When I did this for the data below I got a p-value of 0.0087.
So, the question again: Is this method valid?
data1 <-
structure(list(Year = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L,
5L, 5L, 6L, 6L, 6L), Score = c(-5.2, -2, -1, -3, -5, 3.8, 0,
-3.2, 1.2, 2.2, 11.5, -4, 10.2, 2, 12, 6.5, 6, 6, 9.2, 4.2, 13,
0.8, 8.5, 4.5, 6, 6, 2.7, 8, -3.8, 6.7, 4.5)), .Names = c("Year",
"Score"), class = "data.frame", row.names = c(NA, -31L))
1. https://stats.stackexchange.com/questions/12262/what-if-residuals-are-normally-distributed-but-y-is-not
2. https://stats.stackexchange.com/questions/280189/linear-regression-and-assumptions-about-response-variable
3. https://stats.stackexchange.com/questions/148803/how-does-linear-regression-use-the-normal-distribution
4. https://stats.stackexchange.com/questions/130775/why-do-we-care-so-much-about-normally-distributed-error-terms-and-homoskedastic
...ctd
– Glen_b Aug 27 '17 at 02:031..https://stats.stackexchange.com/questions/16381/what-is-a-complete-list-of-the-usual-assumptions-for-linear-regression
2. https://stats.stackexchange.com/questions/32285/assumptions-of-generalised-linear-model
3. https://stats.stackexchange.com/questions/86830/transformation-to-normality-of-the-dependent-variable-in-multiple-regression
...ctd
– Glen_b Aug 27 '17 at 02:03• Checking assumptions: https://stats.stackexchange.com/questions/45685/testing-assumptions-of-multiple-regression
• Assumptions with categorical independent variables: https://stats.stackexchange.com/questions/226584/regression-assumptions-not-required-for-categorical-dummy-variables
• illustration discussing two assumptions: https://stats.stackexchange.com/questions/96619/validity-of-regression-assumptions-on-residual-plot ...ctd
– Glen_b Aug 27 '17 at 02:04https://stats.stackexchange.com/questions/177015/clues-that-a-problem-is-well-suited-for-linear-regression
Why are diagnostics based on residuals: https://stats.stackexchange.com/questions/76163/why-are-diagnostics-based-on-residuals
... ctd
– Glen_b Aug 27 '17 at 02:05A more reliable source on assumptions: http://andrewgelman.com/2013/08/04/19470/
(my list would differ somewhat as I discuss here but it's a good list)
There's also Wikipedia which is more-or-less okay on this, but since articles may change at any time, a degree of caution is sometimes needed.
– Glen_b Aug 27 '17 at 02:05