Disclaimer
I know that diagnostics with mean residual life plots are extremely liable to subjective interpretation. I was just curious about a consistent behavior I saw running several POT analyses and wanted to know what the community had to say about it.
Question
I'm working on a set of weekly rain maxima collected on 4 different rain stations. I want to conduct a peak-over-threshold analysis. I drew some mean residual life plots, and they are like other MRLs I've seen in the past.
We know is that the expectation of the Generalized Pareto distribution excesses is $$E(X-u|X>u) = \frac{\sigma_u}{1-\xi} \quad \text{for} \quad \xi>0$$
Now, for $v>u$, this expectation becomes
$$E(X-v|X>v) = \frac{\sigma_u + \xi v}{1-\xi}$$
This is clearly linear in $v$, and can be seen as a line with intercept $\frac{\sigma_u}{1-\xi}$ and slope $\frac{\xi}{1-\xi}$.
I would chose the smallest threshold such that all sample means in correspondance with higher thresholds are consistently on a straight line with the above said characteristics.
Now I've done some of these diagnostics for models fitted for all the stations. I notice a regular pattern. For all thresholds, the slope of the produced lines is correct, but the intercept is always greatly overshooting the path of the plot. That is, I get something like this
This is a consistent behavior with other stations and for many thresholds. In the above plot, I have used 25 as threshold (the dashed line). The following is the code and output of the model.
install.packages("ismev")
library(ismev)
daily <- read.table(url("http://m.uploadedit.com/bbtc/1519290151432.txt"), sep = ",", head = T)
daily <- daily$x
thresholds <- seq(0,300, by = 0.5) #determining thresholds
mrl.S1 <- NULL
#creating a mean excess vector
for(i in 1:length(thresholds)){
mrl.S1[i] <- mean(daily[daily>thresholds[i]] - thresholds[i])
}
#running the model
gpd.S1 <- gpd.fit(daily, 25)
plot(thresholds, mrl.S1 , type = "l", xlim = c(0,270))
abline(v = 25, lty = 2)
abline(gpd.S1$mle[1]/(1-gpd.S1$mle[2]), gpd.S1$mle[2]/(1-gpd.S1$mle[2]))
> gpd.S1 <- gpd.fit(daily$S1, 25)
$threshold
[1] 25
$nexc
[1] 224
$conv
[1] 0
$nllh
[1] 987.8364
$mle
[1] 25.0109875 0.1902708
$rate
[1] 0.03066813
$se
[1] 2.76593228 0.08905251
Standard errors are acceptable and the rate of exceedances is average for this type of work. Has anybody seen this sort of behavior? Why could this happen? Inaccurate estimates of the shape parameter? Would this indicate that the proposed threshold is not appropriate?
