2

I am comparing the distributions for the strength of two part designs. The test equipment maxes out at 300 units. The 300 units is an arbitrary limit of the equipment and does not represent a requirement of the design. If the sample survived 300 units of load, I recorded it as a "Pass". I don't know how to statistically show the difference in strength between the two designs. Is there a way to compare distributions if one maxes out a scale?

I don't know how to compare means with 21/36 new samples not having a value. Alternatively, I could look at pass rate but that's not very interesting and the pass criteria is completely arbitrary and not linked to actual functional requirements.

The non-normality of the new distribution doesn't help things either.

Simplified distributions showing issue

Data in question

AlexR
  • 21
  • 1

2 Answers2

0

There are things like Truncated Normal which mostly describe your data, apart from the spike. I would probably cut that off and have pure truncated normal distributions for both datasets.

Now, what is it that you want to compare? What is your hypothesis? One hypothesis could be that location parameter (what would be called mean in the non-truncated normal) of your new design is below that of the old design. I can't think of a good statistic for that, so I would probably estimate the distributions of your location parameters from the data, and then compare those directly.

If this is the route you want to follow the question becomes how to estimate the distribution of the location parameter of a Truncated Normal distribution, given data. Sklearn seems to have a fit method for that https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.truncnorm.html. That would give you a point estimate (so significance testing etc). Using something like Numpyro would give you a distribution estimate. Which one are you after?

Cryo
  • 576
  • @Scortchi-ReinstateMonica, please provide details here. What out of the things that I suggested are incorrect? – Cryo Mar 27 '24 at 17:01
  • Can I not model a random variable with single mode, with distribution that looks like Gaussian with ends chopped off, like truncated normal? – Cryo Mar 27 '24 at 17:20
  • Sorry - I linked to the wrong question. Here: https://stats.stackexchange.com/q/144041/17230. What's described in this question is censoring, not truncation. – Scortchi - Reinstate Monica Mar 27 '24 at 18:42
  • @Scortchi-ReinstateMonica, thanks. The top rated answer in your link says that terms are often used interchangeably, and the goes on to say that an example of truncation is equipment not measuring in certain range. Isn't this exactly what OP referenced by saying 'equipment maxes out' – Cryo Mar 27 '24 at 21:35
  • Tbh, the censoring part also mentions equipment. I am not too bothered about terminology. My point was that it looks like one can model this with distribution that happens to be called truncated normal distribution. Would you agree that this is so? – Cryo Mar 27 '24 at 21:39
  • No. We know that 21 out of 36 parts of the new design had not failed at 300 units, compared with only 1 (if I've read the graph right) out of 32 parts of the old design: in fact this is the most salient feature of the data, & the "spike" mustn't be "cut off". – Scortchi - Reinstate Monica Mar 27 '24 at 22:17
  • @Scortchi-ReinstateMonica, it makes sense to me now. Thanks. – Cryo Mar 27 '24 at 22:25
0

When comparing distributions, my first step is usually a quantile quantile plot. Then you could also do parallel box plots and a Bland-Altman plot (aka a Tukey mean difference plot.

After that, what you should do depends on what you are trying to compare. Cryo gave you some good ideas about the truncated normal -- and the non-normality seems mostly due to the truncation. But you might want to compare medians (which wouldn't be affected by the truncation) or maybe trimmed or Winsorized means, since truncation is a kind of trimming/Winsorization.

Peter Flom
  • 119,535
  • 36
  • 175
  • 383