2

Disclaimer

I know that diagnostics with mean residual life plots are extremely liable to subjective interpretation. I was just curious about a consistent behavior I saw running several POT analyses and wanted to know what the community had to say about it.

Question

I'm working on a set of weekly rain maxima collected on 4 different rain stations. I want to conduct a peak-over-threshold analysis. I drew some mean residual life plots, and they are like other MRLs I've seen in the past.

We know is that the expectation of the Generalized Pareto distribution excesses is $$E(X-u|X>u) = \frac{\sigma_u}{1-\xi} \quad \text{for} \quad \xi>0$$

Now, for $v>u$, this expectation becomes

$$E(X-v|X>v) = \frac{\sigma_u + \xi v}{1-\xi}$$

This is clearly linear in $v$, and can be seen as a line with intercept $\frac{\sigma_u}{1-\xi}$ and slope $\frac{\xi}{1-\xi}$.

I would chose the smallest threshold such that all sample means in correspondance with higher thresholds are consistently on a straight line with the above said characteristics.

Now I've done some of these diagnostics for models fitted for all the stations. I notice a regular pattern. For all thresholds, the slope of the produced lines is correct, but the intercept is always greatly overshooting the path of the plot. That is, I get something like this

enter image description here

This is a consistent behavior with other stations and for many thresholds. In the above plot, I have used 25 as threshold (the dashed line). The following is the code and output of the model.

install.packages("ismev")
library(ismev)

daily <- read.table(url("http://m.uploadedit.com/bbtc/1519290151432.txt"), sep = ",", head = T)
daily <- daily$x

thresholds <- seq(0,300, by = 0.5) #determining thresholds
mrl.S1 <- NULL


#creating a mean excess vector

for(i in 1:length(thresholds)){
    mrl.S1[i] <- mean(daily[daily>thresholds[i]] - thresholds[i])
}

#running the model
gpd.S1 <- gpd.fit(daily, 25)


plot(thresholds, mrl.S1 , type = "l", xlim = c(0,270))
abline(v = 25, lty = 2)
abline(gpd.S1$mle[1]/(1-gpd.S1$mle[2]), gpd.S1$mle[2]/(1-gpd.S1$mle[2]))

> gpd.S1 <- gpd.fit(daily$S1, 25)
$threshold
[1] 25

$nexc
[1] 224

$conv
[1] 0

$nllh
[1] 987.8364

$mle
[1] 25.0109875  0.1902708

$rate
[1] 0.03066813

$se
[1] 2.76593228 0.08905251

Standard errors are acceptable and the rate of exceedances is average for this type of work. Has anybody seen this sort of behavior? Why could this happen? Inaccurate estimates of the shape parameter? Would this indicate that the proposed threshold is not appropriate?

1 Answers1

1

There is a problem in your second displayed formula, because if you let $v \to u$ for $v > u$, you should find the value of the first formula, which is not the case unless $u=0$.

Up to a slight change of notation, your second formula looks like (4.9) in my edition of the famous 'ISMEV' book by Stuart Coles. But (4.16) in the book would translate into $\sigma_v = \sigma_u + \xi [v -u]$, which I guess is the right formula to use here. Note that the two formulas (4.9) and (4.16) in the book are in conflict.

Here is a slightly changed part of your code

## running the model
u <- 25
gpd.S1 <- gpd.fit(daily, threshold = u)

co <- gpd.S1$mle
names(co) <- c("sigma", "xi")

plot(thresholds, mrl.S1 , type = "l", xlim = c(0,270))
abline(v = 25, lty = 2)
abline(a = (co["sigma"] - co["xi"] * u) / (1 - co["xi"]),
       b = co["xi"] / (1 - co["xi"]))
Yves
  • 5,358
  • Oh my, great resource! Checked out the book in the library and indeed the picture is more complete now. Thank you very much @Yves – Easymode44 Mar 01 '18 at 08:34