3

I am trying to predict density function using LOESS in R. However, the predicted values I got are not in the estimated LOESS line.

#Generate data
n<-10000
a1<-a2<-0.1 
a3<-a4<-0.2
a0<-0.1

u1 <-rnorm(n,0,1)    
u2 <-runif(n,0,1)
u3 <-rbinom(n,1,0.5) 
u4 <-rpois(n,5)  

lambda<-exp(a0+a1*u1+a2*u2+a3*u3+a4*u4)

###Compute the density of the variable
x<-lambda
dens<-density(x,bw=0.15,na.rm=TRUE)
gr <- data.frame(with(dens, x), as.vector(dens$y))
names(gr)<-c("xgr","ygr")

###Use loess to smooth the density function
y.loess <- loess(ygr~xgr, data=gr)

###Predict the density using the data
pred.y <- predict(y.loess, newdata=data.frame(xgr=x))


###Plot results
plot(dens,col=1)
###Add loess lines
lines(y.loess,col=2,lty=2)
###add new predictions onto the plots and they are not on the LOESS line!
points(x,pred.y,col=3)

This is really driving me crazy. Any help will be appreciated.

Nick Cox
  • 56,404
  • 8
  • 127
  • 185

1 Answers1

3

I believe you have the correct idea but unfortunately you are "betrayed" by the default span values used by loess. Please see the following code example amending your original version:

set.seed(123) ;     n<-10000
a1<-a2<-0.1 ;      a3<-a4<-0.2;     a0<-0.1 

u1 <-rnorm(n,0,1)
u2 <-runif(n,0,1)
u3 <-rbinom(n,1,0.5)
u4 <-rpois(n,5)    
lambda<-exp(a0+a1*u1+a2*u2+a3*u3+a4*u4)

###Compute the density of the variable
x<-(lambda)
dens<-density(x,bw=0.15,na.rm=TRUE)
gr <- data.frame(with(dens, x), as.vector(dens$y))
names(gr)<-c("xgr","ygr")

###Use loess to smooth the density function
yLoess <- loess(ygr~xgr, data=gr,span=0.1)
yLoess0 <- loess(ygr~xgr, data=gr)

###Predict the density using the data
predY <- predict(yLoess, newdata=data.frame(xgr=x))
predY0 <- predict(yLoess0, newdata=data.frame(xgr=x))
###Plot results
plot(dens,col=1)
###Add loess lines
lines(yLoess,col=2,lty=2)
###add new predictions onto the plots 
points(x,predY0,col='green') # We clearly oversmooth 
points(x,predY,col='blue')   # Much better :) 

funWithSmoother

In general when your smoothed estimates show some kind of unwanted inertia, for example in your case: density estimates having repeated negative values, it hints that the smoothing bandwidth/window/span is larger than what you would probably like.

(By the way, try to avoid using dots (.) in variable names. It is a relic from the past. Within R dots (.) can mean nothing or they can mean simple methods dispatches (e.g. data.frame (dot means nothing), print.lm (dot means method dispatch)); this can be confusing for someone reading your code. In addition if at some point you decide to write C/C++ (the standard low-level language to connect to R) you will find out that : 1. dots in names are not allowed, 2. to access individual members of a structure, the dot operator is used, i.e. you will have to change your naming convention. Keep it simple and use the same naming convention everywhere. :) )

Nick Cox
  • 56,404
  • 8
  • 127
  • 185
usεr11852
  • 44,125