3

Suppose we have

$$y = b_1x_1 + b_2x_2 + b_3x_3 + e$$

as our regression model.

Setting a linear restriction, say $b_1 + b_2 + b_3 = 0$, allow us to rewrite the model as,

$$y = (b_1)(x_1 - x_3) + (b_2)(x_2 - x_3) + e$$

Given that such restrictions often improve out of sample performance of estimates (even when the restrictions are wrong), I was wondering if it may be connected to Stein's Paradox? Intuitively, it seems to me that $b_3$ is essentially 'shrunk' to zero with the restriction, and that somehow improves the estimates.

I was wondering if anyone could give a more theoretically robust explanation? Or if I am wrong here, point out where I am wrong.

chl
  • 53,725
shellsnail
  • 53
  • 4
  • It will be nice, if somebody can relate it to Stein's Paradox even in the absence of a restriction but when 3 or more coefficients to be estimated. – Cagdas Ozgenc Mar 02 '15 at 13:21
  • Restrictions such as $b_1 + b_2 + b_3 = 0$ often improve out of sample performance? Is that really true? I never encountered such a regularization strategy. Instead, what is very often used is a penalty term such as e.g. $\lambda\sum b_i^2$ that needs to be minimized. This particular term leads to "ridge regression", which does indeed shrink all $b_i$ to zero. The relation of ridge regression to Stein's paradox was extensively discussed in this thread. – amoeba Mar 02 '15 at 13:40
  • @amoeba Here's a recent paper showing how restrictions helped improved out of sample predictions for the stock market.

    There are many other such instances in econometric literature.

    Thanks for the link. I have looked at that before posting this. But I thought it was a bit different. I mean intuitively they seem to be about the same thing, but I am wondering if there is a more solid proof or demonstration.

    – shellsnail Mar 02 '15 at 13:41
  • Thanks for the link, but I don't have time to read this paper now, and I anyway know nothing about stock markets. However, I remain quite skeptical about the word "often". Classical texts in machine learning, such as e.g. The Elements of Statistical Learning by Hastie et al. don't mention such a regularization approach, if I remember correctly. Moreover, it does not even fit a standard scheme with introducing a penalty term into the cost function: a linear constraint is not a "soft" penalty term, but a hard constraint... It looks weird to me. – amoeba Mar 02 '15 at 13:50
  • Also, why do you think it will "shrink" $b_3$ to zero? If $b_3 \approx 0$, then this constraint implies $b_1 \approx -b_2$. Why would this be true? – amoeba Mar 02 '15 at 13:52
  • @amoeba Hmm. I am not really sure. Perhaps there's a problem with my reasoning above. Maybe not shrink b3 to zero, but shrinks the combined sum to 0 and hence produce an effect on the estimates like in the Ridge regression scenario?

    But this finding does exist in the literature. What is commonly observed is that a random/false restriction will improve out of sample predictions slightly and a theory-driven restriction tend improve it even further. I recall a paper showing this, just have to find it...

    – shellsnail Mar 02 '15 at 13:58
  • But it doesn't even "shrink the combined sum to 0"; it requires the combined sum to be 0! Anyway, for the relationship between ridge regression and Stein's paradox see the thread I gave the link to. If there is any relation between your "linear constraint" method and ridge regression, this would automatically establish a link between "linear constraint" method and Stein's paradox. Perhaps somebody else will comment on such a relation. – amoeba Mar 02 '15 at 14:05
  • From: http://www.dictionaryofeconomics.com/article?id=pde2008_S000487

    (i) if the restrictions are correct and δ 1⁄4 0, the pretest estimator has a smaller risk than the ML estimator β^ at the origin, δ 1⁄4 0, and the risk depends on the level of significance α and correspondingly the critical value of the test c;

    (ii) as the hypothesis error δ or λ grows, the risk of the pretest estimator β^0 increases, obtains a maximum after exceeding the risk of the MLE, β^, and then monotonically decreases to approach σ2K, the risk of the MLE

    – shellsnail Mar 03 '15 at 11:28
  • In other words, @amoeba it seems that even when restrictions are wrong, and just imposed without testing, for some range of values (not sure exactly what range, maybe someone can comment on this) they would have risk lower than the MLE. – shellsnail Mar 03 '15 at 11:30

0 Answers0