Let's say I have two independent samples (DS and VS). My data is also highly multivariate (e.g. one dependent variable, 300+ potential predictors, N<300). To reduce the dimensionality of this problem I decided to aggregate the effect of my 300+ potential predictors into one index parameter. Therefore, I regressed each predictor on my dependent variable, which provides me with an effect score (beta value) for each predictor. By multiplying these beta values with the predictors, I get the accumulated net effect (risk index) of all 300+ predictors for a particular observation.
Finally, I tested if this risk index is associated with my dependent variable in the Validation Sample (VS). I expected that the beta value for the risk index would fall between 0 (no association) and positive values (predictive power). However, to my surprise I consistently got negative beta values for some sets of predictor variables, especially those that I didn't actually expect to be good predictors.
I'm wondering if regression to the mean might be a sufficient explanation for this?
Would be great if you could point me to some psychology research that achieved the same through LASSO.
– aciM Jun 12 '13 at 19:26