3

How is the R-squared calculated for an elastic net? How about LASSO? Should be different from OLS, or not?

Edit: The main problem is as follows:

We have all kinds of fruits like $f_1, f_2, ..., fn$ for which we have $5$ different properties $c_1,...,c_5$.

We know well 10 of the fruits which are $f_1,...,f_{10}$. For each of these known fruits, we want to find which of the remaining fruits in the set $f_i, 10<i<=n$ can better explain its features.

$f_j = \beta_if_i + \epsilon; 1<=j<=10; 10<i<=n$

To this end, I used elastic net due to the property of my data.

I used cross validation to fit my models and at the end I have 10 models each for one fruit $f_j; 1<=j<=10$.

For me, this is very important to see which of these models are very well fitted to the data. Then I can rank the models from $1$ to $10$ and use this ranking in further analysis.

In my question, fruit $f_k; 1<=j<=10$ might not be well explained by any other fruits! that's why I'm looking for a statistics that I can compare the goodness of fit for each of the obtained model.

I want now to know, whether or not the PRESS or R-squared is a good measure?

Thank you very much. N.

sbmm
  • 83
  • 1
  • 11

1 Answers1

3

Generally speaking, if you are using a regularized method you typically have more variables than you have samples ($n\ll p$). As such, using R-squared is not recommended. This is because a model with all the variables will always have the largest R-squared. It is recommended to do some form of cross-validation.

If your dependent variable is continuous, you likely would use the Predicted Residual Sum of Squares (PRESS) statistic:

$$PRESS=\sum_{i=1}^{N} (y_{i}-\hat{y}_{i,-i})^2$$

If you are really attached the idea of an 'R-squared' you can convert the PRESS into R2 by dividing it by the sum of squares (SS) and subtracting everything from 1:

$$SS=\sum_{i=1}^{N} (y_{i}-\bar{y})^2$$ $$R2=1-PRESS/SS$$

If your dependent variable is a classifier (e.g. Control vs. Disease), then you would use a different metric like Accuracy, Area Under the Receiver Operator Curve (AUROC), or Kappa.

cdeterman
  • 5,101
  • Thanks a lot for your clear response. I have a stupid question, what do you mean by ^{y}_{i,−i}. I don't get (i,-i), would you mind please shortly comment on it. – sbmm Nov 12 '14 at 08:32
  • @sbmm, it is simply the notation for the formal definition of PRESS. It signifies that $\hat{y}_{i}$ (i.e. the current sample) is predicted from from the remaining samples minus the current sample, hence the ${-i}$ – cdeterman Nov 12 '14 at 13:07
  • Imagine, I have an apple in one side and other fruits (except an apple) in the other side. I used elastic net to see which fruits, can better explain the features of the apple. I have 5 features for my fruits. Imagine my y(apple)=(1,1,2,1,3) and matrix X with size 20*5 which means 20 fruits with 5 features. I found a model per fruit via elastic net, and I found out that for e.g., the apple can be explained by pear and quince. Next, I take another fruit and continue the same. I want to see which of the fruits are better modeled by other. Would PRESS be a good statistics to compare my models? – sbmm Nov 12 '14 at 14:37
  • If I want to calculate the goodness of the fit for each fruit, which statistics should I apply? I I know it, then I can compare my models obtained for each fruit. – sbmm Nov 12 '14 at 14:41
  • Have you done some analysis? Do you have a dataset? I would likely be better able to help you if I can 'see' the dataset and how you have done your analysis. From what I understand, you are wanting to compare between classes (i.e. fruits) instead of classifying them from their respective features (which is what elastic-net and LASSO are for). In which case, you should be using a different method. You may wish to start another question to more directly address this. – cdeterman Nov 12 '14 at 15:20
  • Actually my question is the same... I need a statistics to compare my models to find the best model. – sbmm Nov 12 '14 at 15:32
  • Unfortunately I can not show my data....but It is exactly like the example I gave above. – sbmm Nov 12 '14 at 15:33
  • Apple(f1) = (c21,c22,..c25) f2 = (c21,c22,..c25) ... fn = (cn1,cn2,..cn5) – sbmm Nov 12 '14 at 15:34
  • the model should estimate betas: f1 = B2f2+..+Bnfn + epsilon f2 = B1f1+ B3f3+ ..+Bnfn + epsilon ... fn = B1f1+..+Bn-1fn-1 + epsilon – sbmm Nov 12 '14 at 15:37
  • I want to rank the fitted models. I thought r-squared should be good, but it was not. – sbmm Nov 12 '14 at 15:38
  • Now my question is that, how to compare these models? – sbmm Nov 12 '14 at 15:39
  • Is my question clear? or so stupid, or? – sbmm Nov 12 '14 at 15:40
  • First relax, try to be a little more concise with your comments the extensive notifications don't help you. Second, to address your question, are you comparing between models for each fruit? For example, model1 for Apple vs. model2 for Apple or are you trying to compare between model for Apple vs. model for Pear? If the former, you can use R2 as defined in my answer. If the latter, it is not appropriate to compare models addressing different questions. You could only say that your Pear model performs better for pears than your Apple model for apples. – cdeterman Nov 12 '14 at 15:44
  • :) thanks a lot... I will come back in few minutes and tell you my concern with this problem. – sbmm Nov 12 '14 at 15:54
  • I updated my question.... – sbmm Nov 12 '14 at 17:28
  • short answer, yes, as you are comparing models on each fruit you can use PRESS or R2 to compare between the 10 models and rank accordingly. – cdeterman Nov 12 '14 at 17:32
  • Thanks you very much. You mean, I separately calculate PRESS for each of the 10 models and then rank them accordingly. – sbmm Nov 12 '14 at 17:35
  • Yes, that is correct – cdeterman Nov 12 '14 at 17:37
  • PRESS isn't same as MSE ?! ( shouldn't be a 1/n next to the summation) similar to http://stats.stackexchange.com/questions/111324/how-to-find-adjusted-r2-or-r2-from-lasso-and-ridge-regression-model – Areza Nov 17 '16 at 23:13
  • @user4581 no, MSE is the mean of the RSS hence the 1/n. PRESS is the predictive form of RSS. It is usually reported as a raw value, not the average. – cdeterman Nov 21 '16 at 14:42