The best way to compute the PRESS statistic

Question

I would like to forecast the return volatility in a financial market. I am using symbolic regression/genetic programming to generate models with a good in-sample fit. I would like to compute predictive R Squared for each model to enable me to select the model to make an out-of-sample forecast.

The brute force method of computing PRESS for a model involves removing one observation from the data set, finding the values of the model's parameters that minimize the sum of squared residuals, and then making a forecast using the resulting model for the observation removed earlier. Then we repeat this for each observation in the data set.

I am aware of a shortcut that ought to generate the same value of PRESS as the procedure described above. The shortcut is described on PRESS statistic for ridge regression and in the answer to How can one compute the PRESS diagnostic?

All of the sources that describe that shortcut mention that it is valid for "ordinary least squares".

My question is - can the models that I am working with be described as ordinary least squares?

The models generated by my symbolic regression algorithm are of the form Y = a + bf(A,B,C,D,E,G) + cg(A,B,C,D,E,G) + dh(A,B,C,D,E,G) + error term, where the functions f(), g(), and h() are nonlinear products like A*(C^2)DE*(G^3).

For the models of the form above, would the PRESS statistic computed using the full method be the same as the PRESS statistic computed using that shortcut?

Thank you for your kind help!

@Richard Hardy May I please ask you to take a look at this question? Thank you! — BillB, Jul 13 '20 at 18:00
@Tim May I please ask you to take a look at this question? Thank you! — BillB, Jul 13 '20 at 18:00
@james65 May I please ask you to take a look at this question? Thank you! — BillB, Jul 13 '20 at 18:01
PRESS is based on leave-one-out cross validation (LOOCV). LOOCV works in a simple way for cross sectional data but not for time series. Time series cross validation is more nuanced. I think there have been some posts about it here on Cross Validated. Also, Rob J. Hyndman has some useful blog posts and at least one paper on the topic. — Richard Hardy, Jul 18 '20 at 13:51
@Richard Hardy Thank you for mentioning Dr. Hyndman's work. I found the article on time series cross validation in his text https://otexts.com/fpp2/accuracy.html to be extremely helpful. — BillB, Jul 20 '20 at 00:56

score 1 · Accepted Answer · edited Jul 18 '20 at 14:40

In other words, your model is:

$$Y=\beta_0+\beta_1x_1+\beta_2x_2+\beta_3x_3+\epsilon$$

Sure, each $x_i$ is constructed by some nonlinear function of observations, but by the time the model sees those values, the model is forming a linear combination of numbers. The model does not care about the source.

That model is linear and can be fitted through the usual OLS $\hat{\beta}=(X^TX)^{-1}X^T Y$.

You then can calculate your PRESS statistic the usual way.

You’re using the nonlinear basic functions that Jeff Miller talks about in this video: https://youtube.com/watch?v=rVviNyIR-fI. Thus, your regression is linear.

Also, I would encourage you to make a small example where you calculate the PRESS statistic as well as the results from LOOCV to confirm and convince yourself that the methods are the same.

Since you’re working with financial data, you might have issues with time series, but that is for a separate question (perhaps one that will get more attention on quant.SE but would be totally on-topic here).

Thank you so much for answering my question, @Dave. Now I feel a lot better about the validity of my project. — BillB, Jul 20 '20 at 01:55

The best way to compute the PRESS statistic

1 Answers1

Linked