2

A measurement is performed on 100 equal products. Some of these products contain defects. I expect defects to be present on a small number of products, say a total of 10. Now I want to create a population of 30 products (or so) that do contain the statistical product-to-product variation but that do not contain defects.

My approach is as following: I randomly select 30 products. Then I calculate the mean and standard deviation of this population thirty times. The first value is the average of all products except the first product, the second all products except the second one, and so on.

In this manner the products that deviate a lot can be easily found, as removing them from the selected population of 30 causes a large change in the average and standard deviation, see the figure below. This product can be replaced by another, after which the process is reran.

The problem is:

  1. The defect product causes a shift in the mean, covering up other defective products (product 13 makes product 17 seem les bad in the figure below).

  2. I can add bounds (a certain factor) above and below the population mean and remove the outlying products but this is primitive.

Change of mean when removing certain products

My questions: is there any more statistically sound way of determining the probability that a certain product lies outside the population? And is there any literature you could recommend on finding populations from a large group of samples?

H. Vabri
  • 143
  • 1
    have a look here – user603 Sep 05 '18 at 13:50
  • 1
    I believe what you're asking falls under statistical process control. – Digio Sep 05 '18 at 13:58
  • @user603, thank you for the extensive answer. I do not have enough reputation to comment there, so I will do it here. I have some questions:
    1. In your first figure the outliers show a lower score. Is this not expected? The numerator in out_1 is greater than zero, however the denominator increases faster (due to the squared term in the standard deviation). Thus the value will be small. Or is this covered up if many values show large deviations from the leave one out mean?
    2. What is the name of the outlyer detection function you propose? Can you recommend any literature on it?
    – H. Vabri Sep 07 '18 at 08:40
  • yes, your intuition is correct. 2. The outlier detection function based on the median and mad was first proposed by Gauss. As you can imagine, you can find many papers on it. Here is one recent non technical one. Here is an older more technical one.
  • – user603 Sep 07 '18 at 09:04