I am using scipy.stats.gaussian_kde to estimate a pdf for some data. The problem is that the resulting pdf takes values larger than 1. As far as I understand, this should not happen. Am I mistaken? If so why?
Asked
Active
Viewed 1.2k times
11
whuber
- 322,774
Björn Pollex
- 1,383
-
(+1 to the possible duplicate) Just to convey this quickly: Probability is defined as an area under a curve. A probability associated with the value of a PDF at a single point is multiplied by 0 (ie. the width of a line) so if anything the probability itself is 0. The linked thread gives excellent further elaboration on this. – usεr11852 May 29 '16 at 20:20
1 Answers
18
You are mistaken. The CDF should not be greater than 1, but the PDF may be. Think, for example, of the PDF of a Gaussian random variable with mean zero and standard deviation $\sigma$: $$f(x) = \frac{1}{\sqrt{2\sigma\pi}}\exp(-\frac{x^2}{2\sigma^2})$$ if you make $\sigma$ very small, then for $x = 0$, the PDF is arbitrarily large!
shabbychef
- 14,814
-
8Another possible source of confusion is that the pdf of a discrete random variable (also called pmf - probability mass function) indeed cannot exceed 1. – Aniko Dec 29 '10 at 20:40
-
@Aniko: This is indeed a source of confusion. I think I understand now. – Björn Pollex Dec 29 '10 at 20:48
-
1This question is a duplicate of http://stats.stackexchange.com/q/4220/919 . – whuber Dec 30 '10 at 15:28