I'm trying to find out how accurate percentile is. I have used scipy.stats to calculate the rank of percentile. However, it doesn't match the one I have coded.
I have a few questions. Say 25th percentile. Does it mean, equal or below 25th percentile or below 25th percentile? Now please have a look at the code and I will try to explain why I am confused.
import numpy as np
from scipy import stats
def percentile_score(arr, n):
o = 0
for a in arr:
if a <= n:
o += 1
return np.round(o / len(arr) * 100, 2)
def percentile_test(min_val, max_val, total, seed=0):
if seed > 0:
np.random.seed(1)
r = np.random.randint(min_val, max_val+1, total)
p1 = percentile_score(r, 1) # calculate percentage of <= 1
p2 = percentile_score(r, 2) # calculate percentage of <= 2
p3 = percentile_score(r, max_val) # calculate percentage of <= max
print("Min / Max : {} / {}".format(np.min(r), np.max(r)))
print("#1 count vs percentile : {}% vs {}%".format(p1, stats.percentileofscore(r, 1)))
print("#2 count vs percentile : {}% vs {}%".format(p2, stats.percentileofscore(r, 2)))
print("Number #50 probability (<=) : {}% vs {}".format(p3, stats.percentileofscore(r, 50)))
percentile_test(1, 50, 1500, seed=1)
OUTPUT
Min / Max : 1 / 50
#1 count vs percentile : 2.13% vs 1.1%
#2 count vs percentile : 3.8% vs 3.0%
Number #50 probability (<=) : 100.0% vs 99.26666666666667
If percentile necessarily means <=, then #50 must be 100% if it means <, in this case #1 must be 0
Also the calculations I have found don't match with the expected value. What am I missing here?
o[lower-case letter "o") as a name for a variable (or anything else) as it is hard to distinguish from0meaning zero. (I know; that really doesn't answer your question at all.) – Nick Cox Mar 11 '20 at 10:29