1

Suppose I have two datasets of different length as follows:

df1 <- data.frame(x = rnorm(1000, 0, 2))
df2 <- data.frame(y = rnorm(500, 1, 1))

I want to calculate and plot the difference in density plots of df1 and df2. I need the values of difference to calculate the total/mean value of the difference between two density plots.

akash87
  • 3,718
  • 3
  • 13
  • 29
vahed
  • 13
  • 2

1 Answers1

0

First, calculate both densities in their united range u.

u <- range(c(x, y))
dx <- density(x, from=u[1], to=u[2])
dy <- density(y, from=u[1], to=u[2])

Second subtract the estimations yfrom each other.

dd_xy <- dx$y - dy$y

The x sould be the same.

stopifnot(all.equal(dx$x, dy$x))

Plot

Then plot one density and use lines to add the others.

plot(dx, col=4, ylim=c(-.25, .45), main='Density distributions', xlab='')
abline(h=0, lty=3, col=8)
lines(dy, col=3)
lines(dx$x, dd_xy, col=2, lty=2, lwd=2)  ## <---------------- difference
mtext(sprintf('N(x) = %s  Bandwidth(x) = %s', dx$n, signif(dx$bw, 3)), 1, 2)
mtext(sprintf('N(y) = %s  Bandwidth(y) = %s', dy$n, signif(dy$bw, 3)), 1, 3)
legend('topleft', legend=c('x', 'y', 'x - y'), col=4:2, 
       lty=c(1, 1, 2), lwd=c(1, 1, 2), title='density')

enter image description here

Calculations

sapply(c('sum', 'mean', 'sd', 'min', 'max'), \(x) do.call(x, list(dd_xy))) |>
  signif(3)
#       sum      mean        sd       min       max 
# -0.049700 -0.000097  0.088000 -0.249000  0.122000 

Data:

set.seed(42)
x <- rnorm(1000, 0, 2)
y <- rnorm(500, 1, 1)
jay.sf
  • 46,523
  • 6
  • 46
  • 87
  • Thanks for the reply. I know how to plot two density plots in one graph, but I want to plot their difference as a graph and calculate their difference as a unique value. for example: plot(density(df2)-density(df1)) – vahed Sep 19 '21 at 10:57
  • @vahed Aha, please see update. Note, that i have unboxed x and y out of their data frames for sake of simplicity. – jay.sf Sep 19 '21 at 12:33
  • 1
    Thanks Jay.sf. Great!. That's it. It does what I need. – vahed Sep 21 '21 at 09:11