Questions tagged [outliers]

An outlier is an observation that appears to be unusual or not well described relative to a simple characterization of a dataset. A discomfiting possibility is that these data come from a different population than the one intended to be studied.

An outlier is an observation that appears to be unusual or not well described relative to a simple characterization of a dataset. A discomfiting possibility is that these data come from a different population than the one intended to be studied.

However, outliers are not necessarily bad or wrong, nor do they necessarily need to be removed from data for further analysis of that data set. However, outliers (of which there can be more than one in any set of data) indicate that some data at least appear to differ from the bulk of the data set, suggesting they should be individually examined and understood. Also, some statistical procedures are sensitive to outliers: this means that removal of one or more outliers could substantially change the conclusions of those procedures.

1349 questions
28
votes
4 answers

Detecting outliers using standard deviations

Following my question here, I am wondering if there are strong views for or against the use of standard deviation to detect outliers (e.g. any datapoint that is more than 2 standard deviation is an outlier). I know this is dependent on the context…
Amarald
  • 1,165
9
votes
2 answers

Removing outliers from data - maximum number of outliers that you can remove?

I have a couple of outliers in my data and I was wanting to exclude them to see if this changes the results. In you opinion, what is the maximum number of outliers one should restrict themselves to? Thanks!
Kristie
  • 101
8
votes
2 answers

Can I remove sample outliers using standard deviation?

I am looking to find find clinical and other measurements to predict a blood metabolite with Elastic-Net Regression models. Can I remove samples with values greater than 1.96 SD from the mean as outliers? I read in a post stating if the samples are…
Molly_K
  • 203
8
votes
2 answers

Is it reasonable to delete a large number of outliers from a dataset?

I need some advice on what is a reasonable number of cases to be deleted as outliers. I have applied outlier detection methods to identify univariate and multivariate outliers from my dataset. Alltogether 30% of the data was classified as…
8
votes
2 answers

Iterative process for removing extreme samples

My samples follow heavy tail distributions. I use a process to detect and remove "extreme" samples that goes like this: Measure mean and standard deviation of samples. Remove samples higher than mean plus 4 standard deviations. Repeat from Step 1…
iliasfl
  • 2,554
7
votes
1 answer

When finding outliers from the Interquartile range why I have to multiply by 1.5?

I was looking at the outlier detection formula which uses the IQR and I wonder why it should be multiplied by 1.5? Can the constant be increased i.e 3 or 6 to be more "acid" if so under what criteria?
Aureon
  • 145
7
votes
2 answers

Tests for univariate outliers: have Dixon's and Grubb's methods been discredited?

In contrast to the many threads on this site that recommend Dixon's and Grubb's tests, the author of one answer, at this thread, contends that "Really, these have been discredited long ago" and advocates 2 other methods. I don't feel qualified to…
rolando2
  • 12,511
6
votes
1 answer

Tukey's fences for outlier removal

I'm in a biomedical research field, and I see a lot of researchers conducting low N studies that use Tukey's fences for outlier removal. For anyone who doesn't know, Tukey's fences works as such: Calculate quartiles 1 and 3 of your data Add…
6
votes
2 answers

Method to reliably determine abnormal statistical values

I'm searching for a statistical method to determine if a player is cheating in an online game. The game is a Quake3 like game (ego-shooter). Given a number of positive points and a number of negative points per player (score) and given n players…
Quandary
  • 163
6
votes
3 answers

How to tell how extreme an outlier is?

I am analyzing some data and want to look at one particular point and see how "extreme" it is. Do I exclude this outlier from the data, calculate the dataset's standard deviation and average, then compare my outlier to THAT, or do I calculate the…
5
votes
3 answers

Good algorithm for processing positional estimates

We (my team) are building a robot which will navigate around an arena. The robot uses a camera to determine its position based on markers on the wall. We have tested this and found it can determine position well when close to the wall. However, at…
5
votes
1 answer

Popular methods for outlier detection (right skewed distribution)

What are the popular methods for outlier detection in univariate data, which do not assume normal distribution?
user27241
  • 413
5
votes
1 answer

At what value of mean and variance should I throw data away?

I have some score values that are output from a program. There are about 10 such values. The data set is a measure of the "quality" of a speech waveform received over a mobile phone and landline channel. The waveform is passed through an algorithm…
Sriram
  • 295
  • 1
  • 3
  • 10
5
votes
1 answer

Detect outliers in very small data set

I have a data set that includes the different response times of a user that is visiting a web application. For example, a visitor enters www.test.com in the browser and navigates through this domain watching child pages like www.test.com/news,…
enne87
  • 165
4
votes
2 answers

What is a mathematical way to define a point on a scatter plot as an outlier?

I have a graph and there are two points that could be two potential outliers. I'm trying to create a polynomial line of best fit with undefined order. I believe I could use a >2 standard deviations exclusion rule, but I'm sure how exactly this is…
Chad
  • 143
1
2 3 4 5 6