How does ggplot2 density differ from the density function?

Question

Why do the following plots look different? Both methods appear to use Gaussian kernels.

How does ggplot2 compute a density?

library(fueleconomy)

d <- density(vehicles$cty, n=2000)
ggplot(NULL, aes(x=d$x, y=d$y)) + geom_line() + scale_x_log10()

ggplot(vehicles, aes(x=cty)) + geom_density() + scale_x_log10()

UPDATE:

A solution to this question already appears on SO here, however the specific parameters ggplot2 is passing to the R stats density function remain unclear.

An alternate solution is to extract the density data straight from the ggplot2 plot, as shown here

Thanks for the reference. However, the solution doesnt appear to identify the explicit parameter differences. I'm wondering how I can generate/extract the precise density data from the ggplot density. — Megatron, Apr 21 '16 at 23:16
This seems to extract the exact values geom_density plots: http://stackoverflow.com/questions/12394321/r-what-algorithm-does-geom-density-use-and-how-to-extract-points-equation-of — fanli, Apr 21 '16 at 23:18
I dont think this is to do with the density but how you are applying the log tranform — user20650, Apr 22 '16 at 01:20
Is there an alternate log transformation that I can apply to render them identical? — Megatron, Apr 22 '16 at 01:22
consider switching to `ggalt::geom_bkde()` for better density estimates. — hrbrmstr, Apr 22 '16 at 01:49

score 3 · Accepted Answer · answered Apr 22 '16 at 02:18

In this case, it is not the density calculation that is different but how the log10 transform is applied.

First check the densities are similar without transform

library(ggplot2)
library(fueleconomy)

d <- density(vehicles$cty, from=min(vehicles$cty), to=max(vehicles$cty))
ggplot(data.frame(x=d$x, y=d$y), aes(x=x, y=y)) + geom_line() 
ggplot(vehicles, aes(x=cty)) + stat_density(geom="line")

So the issue seems to be the transform. In the stat_density below, it seems as if the log10 transform is applied to the x variable before the density calculation. So to reproduce the results manually you have to transform the variable prior to the calculating the density. Eg

d2 <- density(log10(vehicles$cty), from=min(log10(vehicles$cty)), 
                                               to=max(log10(vehicles$cty)))
ggplot(data.frame(x=d2$x, y=d2$y), aes(x=x, y=y)) + geom_line() 
ggplot(vehicles, aes(x=cty)) + stat_density(geom="line") + scale_x_log10()

PS: To see how ggplot prepares the data for the density, you can look at the code as.list(StatDensity) leads to StatDensity$compute_group to ggplot2:::compute_density

How does ggplot2 density differ from the density function?

1 Answers1