0

I'm trying to find out how accurate percentile is. I have used scipy.stats to calculate the rank of percentile. However, it doesn't match the one I have coded.

I have a few questions. Say 25th percentile. Does it mean, equal or below 25th percentile or below 25th percentile? Now please have a look at the code and I will try to explain why I am confused.

import numpy as np
from scipy import stats

def percentile_score(arr, n):
    o = 0
    for a in arr:
        if a <= n:
            o += 1
    return np.round(o / len(arr) * 100, 2)

def percentile_test(min_val, max_val, total, seed=0):
    if seed > 0:
        np.random.seed(1)
    r = np.random.randint(min_val, max_val+1, total)
    p1 = percentile_score(r, 1) # calculate percentage of <= 1
    p2 = percentile_score(r, 2) # calculate percentage of <= 2
    p3 = percentile_score(r, max_val) # calculate percentage of <= max
    print("Min / Max                    : {} / {}".format(np.min(r), np.max(r)))
    print("#1 count vs percentile       : {}% vs {}%".format(p1, stats.percentileofscore(r, 1)))
    print("#2 count vs percentile       : {}% vs {}%".format(p2, stats.percentileofscore(r, 2)))
    print("Number #50 probability (<=)  : {}% vs {}".format(p3, stats.percentileofscore(r, 50)))

percentile_test(1, 50, 1500, seed=1)

OUTPUT

Min / Max                    : 1 / 50
#1 count vs percentile       : 2.13% vs 1.1%
#2 count vs percentile       : 3.8% vs 3.0%
Number #50 probability (<=)  : 100.0% vs 99.26666666666667

If percentile necessarily means <=, then #50 must be 100% if it means <, in this case #1 must be 0

Also the calculations I have found don't match with the expected value. What am I missing here?

Nick Cox
  • 56,404
  • 8
  • 127
  • 185
Don Coder
  • 445
  • 2
    This mushes together specific questions about Python code and generic questions about the definition of percentiles (quantiles). That's how it is, but it's hard to answer both at once. https://stats.stackexchange.com/questions/250046/how-to-calculate-quartiles explains with a key reference to Hyndman and Fan (noting that Rob Hyndman is a distinguished member here) that there are several more or less defensible precise rules and once that is understood it's usually a question of detail: which one is being used? . – Nick Cox Mar 11 '20 at 10:23
  • I don't use Python so I won't try to use, read or comment on your code. But I would never use o [lower-case letter "o") as a name for a variable (or anything else) as it is hard to distinguish from 0 meaning zero. (I know; that really doesn't answer your question at all.) – Nick Cox Mar 11 '20 at 10:29
  • My question is different from this answer i believe. I'm asking about percentile and what confuses me – Don Coder Mar 11 '20 at 10:34
  • Not so; despite the title of the thread the material there covers quantiles and percentiles which are at root different names for the same idea. Quartiles are just special cases of either. The key is to read Hyndman and Fan -- or at worst if that is not accessible to you to find one of many summaries of their material. For example, the answer in the thread cited uses an R implementation which allows all of their documented rules to be calculated. – Nick Cox Mar 11 '20 at 10:42
  • I don't know R and actually they all end up same outcome. So why percentile 50 is not 100%? Also does percentile mean "equal or below" or just "below"? – Don Coder Mar 11 '20 at 10:44
  • You don't have to use R -- I don't routinely -- but the point is that R documentation is accessible as one of several explanations of multiple existing rules. Conversely, as said, I don't use Python so I won't try to answer your questions about your code example, except to repeat that there isn't a simple single correct answer to your questions about equalities or inequalities, and there can't be. – Nick Cox Mar 11 '20 at 10:50

0 Answers0