How to properly ignore results with high variance

Question

I'm trying to estimate performance results of different configurations. In each test one machine is generating requests to a server for x minutes.

The output is: 1. Number of attempts 2. Number of successful requests 3. Time of each request

My problem is that (as an example) most of the requests take about a second or less and then there are a few requests that take 120 seconds.

I need to make a clear and simple output graphs. So:

A. Is there a proper way (formula) to "ignore" results that are larger than x? I can simply omit some results from the average but was wondering if there is a more elegant way to add it to a formula.

EDIT: Deleted the second question.

You have two big and different questions bundled together. B asks for how to determine "best", but without knowing the trade-off between success and speed, there are no obvious simple answers. A would usually be regarded as the problem of how to deal with outliers. The title word "variance" is not quite right, as the problem is how to deal with some very high values. I've added a tag outliers but you should look at some of the highest voted threads on that. You don't seem to have a new or different problem there. — Nick Cox, Apr 13 '18 at 10:59
For the A part of your question. I would bin the data (say in 30 minutes interval). Start with a coarse model for the non outlying values of 'Time of Request', like exponentially distributed (eventually allowing for a non 0 shift parameter). Have a look at how to detect outliers in that setting, for example in these answers. Hopefully, that would help you identify the extreme measurements. — user603, Apr 13 '18 at 11:40
I have changed the title to outline the time series as well as outliers detection aspect of your question. Have a look at detection of outliers in time series context here. Feel free to change back! — user603, Apr 13 '18 at 11:46
@user603 I don't see emphasis on time series here. Edits should just be minor. Naturally I agree that the full problem might entail also looking at dependence in time, but that's a matter for comment rather than rewording the title on the OP's behalf. — Nick Cox, Apr 13 '18 at 11:55
Post-edit: Have you tried specifically the approach in this answer by Glen_b? — user603, Apr 13 '18 at 21:01

How to properly ignore results with high variance

0 Answers0