10

For the selection of predictors in multivariate linear regression with $p$ suitable predictors, what methods are available to find an 'optimal' subset of the predictors without explicitly testing all $2^p$ subsets? In 'Applied Survival Analysis,' Hosmer & Lemeshow make reference to Kuk's method, but I cannot find the original paper. Can anyone describe this method, or, even better, a more modern technique? One may assume normally distributed errors.

shabbychef
  • 14,814
  • 1
    Are you refering to the following paper? Kuk, A.Y.C. (1984) All subsets regression in a proportional hazards model. Biometrika, 71, 587-592 – chl Aug 25 '10 at 19:22
  • yes indeed. I guess I'll have to dig up that paper somehow. It seems old, however. – shabbychef Aug 25 '10 at 22:56
  • 2
    Find this article in the meantime, The lasso method for variable selection in the cox model, from Tibshirani (Stat. Med. 1997 16: 385-395), http://j.mp/bw0mB9. HTH – chl Sep 04 '10 at 21:30
  • 1
    and this more recent one (closely linked to the penalized R package), http://j.mp/cooIT3. Maybe this one too, http://j.mp/bkDQUj. Cheers – chl Sep 05 '10 at 08:29

3 Answers3

13

I've never heard of Kuk's method, but the hot topic these days is L1 minimisation. The rationale being that if you use a penalty term of the absolute value of the regression coefficients, the unimportant ones should go to zero.

These techniques have some funny names: Lasso, LARS, Dantzig selector. You can read the papers, but a good place to start is with Elements of Statistical Learning, Chapter 3.

Simon Byrne
  • 3,486
  • 19
  • 31
  • 2
    BTW, the penalized R package (http://j.mp/bdQ0Rp) includes l1/l2 penalized estimation for Generalized Linear and Cox models. – chl Aug 25 '10 at 20:07
  • stuck in matlab land, implementing it myself... – shabbychef Aug 25 '10 at 23:23
  • LARS is great, BTW. very cool stuff. not sure how I can jam it into the framework of Cox Proportional Hazards model, tho... – shabbychef Aug 26 '10 at 16:56
  • 2
    The Glmnet software has a lasso'd Cox PH model: http://cran.r-project.org/web/packages/glmnet/index.html there is also a MATLAB version (not sure if it does a cox model though): http://www-stat.stanford.edu/~tibs/glmnet-matlab/ – Simon Byrne Aug 27 '10 at 09:29
4

This is a huge topic. As previously mentioned, Hastie, Tibshirani and Friedman give a good intro in Ch3 of Elements of Statistical Learning.

A few points. 1) What do you mean by "best" or "optimal"? What is best in one sense may not be best in another. Two common criteria are predictive accuracy (predicting the outcome variable) and producing unbiased estimators of the coefficients. Some methods, such as Lasso & Ridge Regression inevitably produce biased coefficient estimators.

2) The phrase "best subsets" itself can be used in two separate senses. Generally to refer to the best subset among all predictors which optimises some model building criteria. More specifically it can refer to Furnival and Wilson's efficient algorithm for finding that subset among moderate (~50) numbers of linear predictors (Regressions by Leaps and Bounds. Technometrics, Vol. 16, No. 4 (Nov., 1974), pp. 499-51)

http://www.jstor.org/stable/1267601

Thylacoleo
  • 5,399
  • 5
  • 26
  • 33
  • yes, the question is somewhat ambiguous; there are, as you mention, many definitions of 'optimal': via information criterion, cross validation, etc. Most of the heuristic approaches I have seen to the problem proceed by stepwise predictor addition/removal: single pass forward addition or subtraction, etc. However, Hosmer & Lemeshow make reference to this method (a variant of work by Lawless & Singhal), which somehow 'magically' selects predictors by a single computation of an MLR (modulo some other stuff). I am very curious about this method...
  • – shabbychef Aug 26 '10 at 16:55