5

If one has dataset with a single outlier such as the following graph taken from Vanni-Mercer et al. (2009), is there a statistical test that one can use that accounts for the single outlier rather than having to throw it out or declare significance because of a single data point?

RT is reaction time. Trial rank is essentially the trial number. enter image description here

  • This is not my research. The authors kept the first data point in and I was wondering why instead of doing a post-hoc analyses using a Tukey HSD they used some other technique that took into account the outlier. – Phillip Cloud Mar 29 '11 at 02:48

2 Answers2

4

Typically in RT studies there's good reason to believe that the first trials are different qualitatively from the rest and the long RT is merely an indicator of that. Why would you want to bother keeping them?

John
  • 22,838
  • 9
  • 53
  • 87
  • Maybe the original poster is interested in quantifying the phenomenon. – Mike Lawrence Mar 28 '11 at 23:15
  • Your answer is useful from a conceptual point of view, but I want to know about statistical tests for phenomena like the data shown. – Phillip Cloud Mar 29 '11 at 12:57
  • Then I guess I'd have to ask what you mean by "account for" and "significant" in your question. Perhaps you should edit it to make it more clear what conclusion might be drawn from the data that you don't want and what you would like to achieve. My answer has general applications as well. If a point is really far away from everything else it might be stochastic... or it might just be something that's very different. – John Mar 30 '11 at 01:17
2

You might consider checking out the gamm4 package in R, which basically finds a non-linear function that fits the data while auto-penalizing complexity. I recently used it to fit a similar data set, then obtained the residuals and used these to bootstrap pretty confidence ribbons for the fit.

Mike Lawrence
  • 13,793
  • My answer assumes that you're doing typical cognitive science work where you have repeated measurement within each of multiple participants and therefore need to model participants as a random effect. If you somehow don't need to specify random effects, then use gam() from the mgcv package (http://cran.r-project.org/web/packages/mgcv/index.html). – Mike Lawrence Mar 28 '11 at 23:14