4

I play a game online (Heroes of Newerth) which has a large ladder of players, each player having a couple of different ratings. I've manually gathered the rating data for all percentiles of players but am blanking at how to turn this into a histogram and compute the Mean and Standard Deviation. I'd also like to have a graph to show the curve.

I could easily do so using my TI-83+ and I consider myself very proficient at Excel, but I don't see an obvious way to do this. Below is a link to the data, or if you'd prefer to give me instructions on how to create the distribution I'd be happy to do so myself.

https://spreadsheets.google.com/spreadsheet/ccc?key=0Atwn_lcLizk9dDZ4WW1YV3BZQnFSaGNoQVdRa0JVZ3c&hl=en_US&authkey=CKyI38QJ#gid=0

Decency
  • 173
  • Would you include some example data? – csgillespie Jun 27 '11 at 14:34
  • Edited the OP and beat you to the punch. -.- As you can see, there are two different ratings, MMR and PSR. Many people dispute a correlation between the two, so I decided to settle the discussion by doing a bit of statistical analysis. This is where I've become stuck. I'd like to compare the Mean, StdDev, and histogram graphs of the two ratings. Thanks. – Decency Jun 27 '11 at 14:36
  • 2
    @Decency To look for correlations, begin by drawing a scatterplot of the raw (per-person) (MMR, PSR) pairs. Comparing their histograms (and means and SDs) tells you nothing about correlation. – whuber Jun 27 '11 at 19:40
  • Statistically that's true, but simply looking at the curves would give a reasonable sense of whether the ratings align. I do see your point., but as there are over 250,000 players with no easy way to draw the data (I manually pulled each integer percentile), aside from a random polling- which I'm not willing to do- this will have to suffice. – Decency Jun 27 '11 at 20:14
  • 1
    @Decency Any reason for reverting @mbq's edits about title sentence capitalization that was discussed on Meta (but see other questions having the word "histogram" in their title)? – chl Jun 27 '11 at 21:03
  • I don't read meta. The point of a title is to emphasize important words. Hence "title case", which I didn't even fully use. Edits should be significant, not stylistic. – Decency Jun 28 '11 at 13:13
  • @Decency You should. This is not a punctilious will of our part, or the community, and if you just want to emphasize important words, use corresponding typographical conventions, namely italic letters. – chl Jun 30 '11 at 20:59
  • It's an arbitrary and insignificant stylistic rule that most people don't seem to understand, seeing as by just quickly skimming in 30 seconds I found a half dozen plus topics that had been edited by mbq (which seems ample evidence that your non-punctilious claim is false). The same way I wouldn't force my own stylistic conventions into someone else's source code, I wouldn't do it to their writing, especially on such an insignificant and capricious issue. It's an unwarranted waste of mine, his, and now your time. – Decency Jul 01 '11 at 13:39
  • @Decency I think you misunderstood my comment. I'm just saying that we all agree to keep this site as "clean and uniform" as possible, which means, among other things, having some kind of a normative approach to write post titles as they are the most visible pieces of information on SE main pages. I agree this might appear as a waste of time, but well someone has to take care of that, right? I guess @mbq and I, and other users, are happy to "waste our time" making some edits, fixing typos (even if English are not our mother tongue language), adding hyperlinks, retagging, for the good of all. – chl Jul 02 '11 at 11:04
  • @Decency This is not wasting time -- this way we make posts more reusable and hold some level of aesthetics, this way increasing traffic and quality (same how banks use marble and suits to hide the fact they don't really have the money you gave them). –  Jul 02 '11 at 13:57
  • You have no evidence for any of those claims, you're merely enforcing your stylistic opinion of what a title should look like on others. If my question is poorly phrased or I have misspelled words you're more than welcome to edit for clarity, but arbitrary things like the above are ridiculous, more akin to OCD than actual editing. I personally feel that by capitalizing important words of a title I'm more likely to draw attention to the key aspects of that problem. If you feel otherwise, you're welcome to capitalize your titles however you please. – Decency Jul 04 '11 at 03:10
  • 1
    @Decency (If you want to continue this discussion about editing, please do it on meta or on chat.) Concerning your analytical objectives, looking at the individual curves still tells you nothing about how ratings "align" or are correlated. One way to handle large datasets is with random subsampling. In your case, a scatterplot involving, say, 1000 randomly selected (MMR, PSR) pairs will be informative and accurate. – whuber Jul 04 '11 at 16:42
  • I am aware, I acknowledged that in my reply to you. The effort that it would take to gather that data is not worth the investment, and the data would also have to be pruned because many players only have one of the two ratings, not both, so this will have to suffice. Apologies for spamming your inbox. – Decency Jul 04 '11 at 16:56
  • @mbq: "How to Edit: ► always respect the original author." – Decency Jul 04 '11 at 16:59
  • @Decency I'm only improving it to meet site standards. –  Jul 04 '11 at 18:29

1 Answers1

6

Your life is a bit easier because you have values at every integer percentile.

  1. To calculate the mean and std deviation just take the mean and standard deviation of the columns.
  2. To plot histograms, calculate the difference between the $i^{th}$ and the $(i-1)^{th}$ value. You can now just take histograms of these numbers to get what you want. Personally, I won't bother with a histogram, but would just plot the differences.

When I played about with your data, I got a "U-shaped" distributions. Indicating that there are lots of very good player and lots of players who start and then quit the game.

csgillespie
  • 13,029
  • Thanks for taking a look and the instructions. If I told you that players start at 1500 rating and these ladders use a modified Elo system, would that change what you said in the last sentence? – Decency Jun 27 '11 at 15:01
  • Yes. If players start at 1500, that implies that people are either good or bad, with very few people in the middle - hence the U shape. – csgillespie Jun 27 '11 at 15:07
  • Sorry it's taken so long for me to get back to you, I kept forgetting to look at the data. After playing with it and your solution, I believe I've asked the wrong question. What I'm looking for is a probability density function, or a curve that looks similar to a normal distribution. That explains your confusion at my asking for a histogram. – Decency Jul 04 '11 at 03:51
  • 1
    You have the probability from the percentiles. What's the probability of getting over the 50th value - 0.5. What's the probability of getting over the 95th value, 0.05. – csgillespie Jul 04 '11 at 08:55