6

I understand that the pdf function is not a probability, and the area under the curve must sum to one. I understand that the height of the pdf function is meaningless, and it is not a probability but a density of the probability.

Can I say that the height of the pdf tells us which value of the random variable can occur more likely than the other values?

For example, the following answer came from ChatGpt.

Let's consider a numerical example related to children's weight. Suppose we have a dataset of children's weights that follows a normal distribution with a mean of 50 kg and a standard deviation of 5 kg. For Child A, let's say their weight is 55 kg. We can calculate the probability density at this weight using the dnorm() function: dnorm(x = 55, mean = 50, sd = 5) The resulting probability density would be a numeric value representing the likelihood of observing a weight of 55 kg within the given normal distribution. For Child B, let's say their weight is 45 kg. Similarly, we can calculate the probability density at this weight: dnorm(x = 45, mean = 50, sd = 5) This will give us the probability density for a weight of 45 kg within the normal distribution. Comparing the probability densities for Child A and Child B will provide insight into the relative concentration of weights around these values within the distribution. A higher probability density suggests that weights closer to that particular value are more likely to occur.

Nick Cox
  • 56,404
  • 8
  • 127
  • 185
Maryam
  • 1,620
  • 1
    Can I say that the height of the pdf tells us which value of the random variable can occur more likely than the other values? What would "more likely" mean to you? It's not in dispute that the PDF height is the "likelihood" in the technical sense of the word as statistics nomenclature, so there is some merit to the ChatGPT response, but that technical definition need not align with a more colloquial definition that you might use. – Dave Jan 09 '24 at 13:55
  • 3
    The height of the pdf is not meaningless at all: it is as said a probability density. It's not more mysterious or meaningless than any other kind of density, just an idea that is unfamiliar until it becomes familiar. – Nick Cox Jan 09 '24 at 14:33
  • 4
    ... And a good pedagogical way to help it become familiar is to learn to construct and read true histograms, which are not mere bar charts: they use the areas rather than the heights of the bars to represent probability. Once that is learned, the move to a density (which is a limiting version of a histogram) is natural and easy. – whuber Jan 09 '24 at 15:03

3 Answers3

10

This is really a FAQ, you are on the right track, but, the height is not really a relative frequency, as you say in the title. Values of density functions do not represent frequencies or probabilities directly. What you need is the areas below the density function, as explained beautifully at Intuitive explanation for density of transformed variable?

Then in question body Can I say that the height of the pdf tells us which value of the random variable can occur more likely than the other values?

In a way, yes (assuming sufficient continuity). Say you want to compare how likely values around $x$ is compared to values around $y$. Then you calculate the density ratio $f(x)/f(y)$. If, say, $f(x)/f(y)=2$, then the probability of values in a short interval around $x$ would be the double of values in a short (same length) interval around $y$.

9

You only have a probability for an interval of length >0 on the real line (for a truly continuous random variable), but as you say after seeing a value you can assess how relatively likely it is compared with other values.

On the other hand, if you measure with rounding (e.g. weight rounded to the nearest full kg), you can integrate out how much probability mass there is in the interval that gets rounded to the value you talk about. E.g. in your example, the probability of a weight that gets rounded to 55 kg is using R pnorm(q = 55.5, mean = 50, sd = 5) - pnorm(q = 54.5, mean = 50, sd = 5) $\approx 0.048$ (= 4.8%).

Björn
  • 32,022
5

Yes; take for example the normal distribution. Most of the mass is in the center/middle, and fewer mass at both of the tails, meaning that if you take a random value from that distribution, you are much more likely to end up with a value that lies in the center of that distribution than in the tails.

So in that sense, the height of a PDF is related to how likely a number is. (This is also the concept of likelihood).

  • This isn't an issue for a Gaussian distribution, but what about a distribution like the one drawn around the 10:15 mark here? – Dave Jan 09 '24 at 14:02