0

I don't have any colleagues who I can ask about this so I must turn to my colleagues on Cross Validated.

I am fitting a stacked adaptive elastic net regression and am having some trouble understanding some output. The model performs k-fold cross validation across 100 candidate values of lambda, the L1 regularisation parameter (I set the L2 parameter at a constant 1 so it doesn't vary).

The function in the package I am using spits out a nice, neat output of the model with the lowest out-of-sample prediction error. However, based on this post and this post the model I should report is the simplest model within 1 standard error of the minimum out-of-sample error.

This brings me to the source of my confusion. The function also gives me the average cross-validation error, or rather 100 of them, one for each candidate value of lambda. I gather this is simply the prediction error for each of the series of 'left-out' values in the k-fold procedure, summed and then divided by k. This I understand (I think).

What I don't understand is that the package also gives me the 'standard error of the average cross validation error'. I thought this would be a single value, i.e. average cross validation error for each value of lambda, divided by the number of lambdas. But there are also 100 of these values. So what are these 100 values? Is it the standard deviation of prediction errors across the k folds for each model/lambda? That is the only thing I can think of.

I appreciate my question is a little vague and may require follow-up questions but I am struggling to find a place to start the process.

p.s. the function is cv.saenet from the miselect package in R. I didn't want make this a software post

p.p.s. Happy to supply more details as needed

llewmills
  • 2,151
  • 17
  • 35
  • 1
    have a look at the glmnet documentation. basically for each lambda you have an average error(over the folds) and a standard error of the average(over the folds), ie the standard devaition of the fold average error/number_of_folds . what miselect does I can't tell you. – seanv507 Sep 06 '23 at 14:44
  • Thanks @seanv507. So it sounds like I was on the right track: standard error of prediction error is the average deviation of each fold's prediction error from the mean prediction error across the folds. In that case it makes sense that there would be 100 of them in my output since there would be five folds for each lambda and hence a se prediction error for each lambda. Super helpful actually. Thanks again. – llewmills Sep 06 '23 at 15:08
  • just to be clear it's measuring the accuracy of the average (see https://en.wikipedia.org/wiki/Standard_error), so it's the standard deviation of the prediction errors over the folds divided by the sqrt of the number of folds – seanv507 Sep 06 '23 at 15:24
  • Ok thanks, an important distinction. That is very helpful again. So $se_{predError} = \frac{sd_{predErrorAcrossFolds}}{\sqrt{k}}$. If you put in an answer I will accept it. Otherwise thank you so much for your help. I was having stress dreams about this, – llewmills Sep 06 '23 at 22:20

0 Answers0