1

I'm trying to do a regression on a continuous variable, but the conditions are as such:

$Y = B_0 + B_1X_1 + B_2X_2$

with something like

$Y = \operatorname{IF}((1.2 * X_1) > (2 * X_2 + 0.5 * X_3 + Constant))$ then $Y = B_1 * ((1.2 * X_1) - (2 * X_2 + 0.5 * X_3) )$ else $Y = \text{some constant}$

$Y,X_1,X_2,X_3 <- \text{Data structure}$

I have multiple IFs like the above, where Y depends on the difference between combinations of independent variables, and on top of that the output may be non-linear (the difference may need to be above some threshold before it kicks in).

I've been racking my brain on how to proceed on this, and nothing seems to feel right. Should I go ahead and create the differences and use those as inputs rather than the "raw" data feeds? Interaction variables don't seem like they would do the job....I'm a bit stuck on how to proceed. I've looked into Threshold Regression and maybe MARS, should I explore that rabbit hole?

Hoping someone here with more experience and brains can help solve this problem or point me in the right direction.

A real world example of such a situation would be if you have a heating furnace and can either use fuel oil or natural gas to feed it, and your consumption of fuel oil would depend on the temperature (if temp < 0 let's say) AND B1 * (heating oil price - natural gas price is above some threshold).

  • Using a quantitative model informed by physics, economics, or some other underlying quantitative theory can be powerful and convincing. In an answer at https://stats.stackexchange.com/a/148166/919 I provided a detailed example of how one might proceed in that fashion. – whuber Mar 02 '20 at 15:11
  • 1
    Ah thank you @whuber , I will review your answer as it seems like the way to go. Looks like quadratic will work to an extent, which is close to what my results said too, but ideally a more comprehensive understanding of the model before hand should be used. – Michael Rockland Mar 03 '20 at 01:18

0 Answers0