11

I am using scipy.stats.gaussian_kde to estimate a pdf for some data. The problem is that the resulting pdf takes values larger than 1. As far as I understand, this should not happen. Am I mistaken? If so why?

whuber
  • 322,774
  • (+1 to the possible duplicate) Just to convey this quickly: Probability is defined as an area under a curve. A probability associated with the value of a PDF at a single point is multiplied by 0 (ie. the width of a line) so if anything the probability itself is 0. The linked thread gives excellent further elaboration on this. – usεr11852 May 29 '16 at 20:20

1 Answers1

18

You are mistaken. The CDF should not be greater than 1, but the PDF may be. Think, for example, of the PDF of a Gaussian random variable with mean zero and standard deviation $\sigma$: $$f(x) = \frac{1}{\sqrt{2\sigma\pi}}\exp(-\frac{x^2}{2\sigma^2})$$ if you make $\sigma$ very small, then for $x = 0$, the PDF is arbitrarily large!

shabbychef
  • 14,814
  • 8
    Another possible source of confusion is that the pdf of a discrete random variable (also called pmf - probability mass function) indeed cannot exceed 1. – Aniko Dec 29 '10 at 20:40
  • @Aniko: This is indeed a source of confusion. I think I understand now. – Björn Pollex Dec 29 '10 at 20:48
  • 1
    This question is a duplicate of http://stats.stackexchange.com/q/4220/919 . – whuber Dec 30 '10 at 15:28