1

Here:

https://www.python-graph-gallery.com/74-density-plot-of-several-variables

Python's seaborn library is used to illustrate how to plot multiple kernel density estimates (KDEs) on the same set of axes using the famous iris dataset. However, it seems to ignore the fact that the plots should be computed using the same bandwidth.

I have two datasets that I need to plot in R using density(), but believe I should ensure the curves have the same bandwidth.

Is it necessary to ensure this? That is, what are the statistical consequences of not ensuring this? Could someone illustrate with an example?

compbiostats
  • 1,557
  • R allows the bandwidth to be set manually as in x<-rnorm(10^3);plot(density(x,bw=0.234)) though whether this is better than two different default automatic bandwidths is another question – Henry Nov 23 '22 at 15:58
  • @Henry Sure, but how would one know which of the two bandwidths to use for both plots? The data would have to be plotted on the same fine grid of points, I believe. In your example bw=0.234 may not be the optimal choice. – compbiostats Nov 23 '22 at 16:03
  • I was just trying to show you can do this manually in R, and so ensure both plots use the same bandwidth. I chose $0.234$ as an arbitrary example, though for that sample size and distribution it turns out to be close to R's default choice - but would be different with a larger sample size or another distribution. You could do your own calculation of what is best and then set the two bandwidths equal to that number – Henry Nov 23 '22 at 16:08
  • 1
    While not essential to answer, I'd be curious to hear why you think the bandwidths should be identical. An appropriate choice of bandwidth depends on the shape of the distribution and the sample size. If one "believes" in an automatic choice for the bandwidth, why then insist on the same bandwidth? You can get the automatic bandwidth with d1=density(x1); d1$bw and the same for the second set. You could then take the average, the minimum, or maximum of the two bandwidths and then run density on both data sets. – JimB Nov 23 '22 at 18:32
  • My post at https://stats.stackexchange.com/a/428083/919 shows how to study the effects of varying bandwidth on the KDW and includes R code to specify the bandwidth. The code at https://stats.stackexchange.com/a/438972/919 deals with the same problem of using the same bandwidth to compare KDEs. – whuber Nov 23 '22 at 18:57

0 Answers0