2

I think what I need may be called reverse regression. Usually, linear regression is to lm(y~x1+..x2) so that to find the estimated coefficients of each variable. Then we can write the formula which is y = constant + coef1 * x1 +coef2*x2 +...+coefn*xn. But I think the reverse way is that we modify the some values of coef*variable and get the coef.

The reverse regression is that we know the values of y and values of something like coef1*x1, however, we

The way I know is to modify the coefficients of x1 or x2. Then, try lm((y - new_coef*x1) ~ x2 + x3 to double check the coefficient of x1.

Are there any other ways ?

The example above is made up, so there is no true dataset. I am just curious if there is any way to modify coefficients

  • First of all did you mean "which is 0.19"? Other than that, why do you think that the coefficient is too large? Is this coefficient statistical significant? – M. Chris Jul 20 '22 at 10:28
  • Can you show us a pairs plot of y and x's? – user2974951 Jul 20 '22 at 10:31
  • 2
    You could omit x3 from your model, driving its coefficient to 0. More seriously, are you comparing like with like, as each coefficient has units (units of y ) / (units of this x)? A particular coefficient can appear large or small merely because it doesn't have the same units as others. – Nick Cox Jul 20 '22 at 13:38
  • Hi @NickCox, I have considered the situation such that different units will have impact on the model coefficients. The reason I have thought this question is that when I was constructing this model in a business way(not academically strict), some coefficients should not be that small. Therefore, I may need fix a coefficient for one or two predictors, and then get the new coefficients for the rest variables. – nobodyishere Jul 20 '22 at 13:49
  • 4
    As @mkt answers, that sounds somewhere between dubious and indefensible without a rationale. – Nick Cox Jul 20 '22 at 15:12
  • 1
    You can set any of the coefficients to be anything you like. This can be justified by (a) explaining why you chose the value(s) that you did and (b) comparing the quality of your model with the original one. Although this is relatively rare, I provided an example (in an admittedly artificial situation) at https://stats.stackexchange.com/a/10520/919. – whuber Jul 20 '22 at 15:23
  • But if you have good grounds that thinking that the coefficient should be say k, then regressing y - k x3 against x1 and x2 might be indicated. Some software allows that to be done by constraining a coefficient. – Nick Cox Jul 20 '22 at 16:50
  • @NickCox I saw a constraint option in stata, but it seems to allow we to set constraints like x1= -x2 not the x1= a number. – nobodyishere Jul 21 '22 at 02:29
  • @whuber Thanks, I am reading it now. – nobodyishere Jul 21 '22 at 02:29
  • I've never used constraint in Stata that I can recall, but the documentation seems to include this case. – Nick Cox Jul 21 '22 at 07:23
  • I find out that I may need do a reverse regression. I will clarify it in my new edit version. – nobodyishere Jul 21 '22 at 14:11
  • Something is missing from your edited question after "however, we". – whuber Jul 22 '22 at 16:11

2 Answers2

12

It is possible to fix a coefficient at any value, though that would be dubious without a strong justification (such as a known physical constant). Omitting the variable would be the same as fixing its coefficient at 0.

A more reasonable approach is to use an informative Bayesian prior. This will affect the final coefficient value but will also take into account patterns in the data. Note that this is only true if one defines the prior so that it captures the distribution of possible values well. It is possible to define a prior that constrains the coefficient so narrowly that the data doesn't really affect the final coefficient estimate, and this would be just as dubious as fixing the coefficient. So you'll need to think carefully to define a reasonable prior.

mkt
  • 18,245
  • 11
  • 73
  • 172
1

Processes that reduce the magnitude of coefficients are called "regularization". Normally, though, you decide what regularization to do before performing the regression, rather than looking at your regression first, and then tailoring the regularization to a particular goal for a coefficient.

Regularization also usually is applied equally to all the coefficients, but it doesn't have to be (and if you don't standardize your variables, then your choice of units can make a nominatively symmetric regularization have an asymmetric effect). If you have reason to think that one of your variables is less likely to be significant, it is valid to use a different regularization hyperparameter for that variable.

Regularization is equivalent to a Bayesian prior, the solution mkt brought up, but you'll probably get more useful results looking up how to do regularization than Bayesian priors.