10

In our logfiles we store response times for the requests. What's the most efficient way to calculate the median response time, the "75/90/95% of requests were served in less than N time" numbers etc? (I guess a variation of my question is: What's the best way to calculate the median and standard deviation of a bunch stream of numbers).

The best I came up with was just reading all the numbers, ordering them and then picking out the numbers, but that seems really goofy. Isn't there a smarter way?

We use Perl, but solutions for any language might be helpful.

brian d foy
  • 124,803
  • 31
  • 200
  • 568
Ask Bjørn Hansen
  • 6,429
  • 2
  • 25
  • 39
  • Show a sample of your logfile – xxxxxxx Sep 29 '09 at 07:56
  • hi spx2 - our logs are just line-terminated JSON structures, where one of the elements is a list of various time counters (actual time, cpu time, etc). I don't think it's too interesting; we'll do a map-reduce type thing to pull out list of response times (by page type, etc). – Ask Bjørn Hansen Sep 29 '09 at 08:07
  • I would have thought with your magic that 110% of the requests were served before they even left the requestor. :) – brian d foy Sep 29 '09 at 23:57

4 Answers4

11

See the article Calculating Percentiles in Memory-bound Applications. It explains how to calculate median and other percentiles efficiently.

Also, here's an article on calculating standard deviation (variance) as you go: Accurately computing running variance.

John D. Cook
  • 28,911
  • 10
  • 65
  • 93
6

you can have look at quick select:

http://en.wikipedia.org/wiki/Selection_algorithm

Or at the Wirth algorithm: http://www.mail-archive.com/numpy-discussion@scipy.org/msg20059.html

Benchmark for the median can be found here: http://ndevilla.free.fr/median/median/index.html

LeMiz
  • 5,318
  • 5
  • 26
  • 23
4

Have a look at PDL... the Perl Data Language.

Also see these previous SO questions about mean/std dev:

/I3az/

Community
  • 1
  • 1
draegtun
  • 22,238
  • 5
  • 47
  • 70
2

There are code examples here: http://rosettacode.org/wiki/Standard_Deviation

glenn jackman
  • 223,850
  • 36
  • 205
  • 328