How to combine two linear models?

Question

I am trying to predict housing prices using as few variables as possible. One way that yielded the best results so far is splitting the data into two data-sets ($houses < 100m^2$ and $houses >= 100m^2$) and then constructing two linear models, one per dataset.

The problem: The resulting prediction will not be continuous and have a jump at $100m^2$ (could be both down and up depending on some dummy variables).

Is there any standard solutions for combining two statistical models (which use the same variables with different effectparameters) such that there are no jumps at $100m^2$?

Take a look at our [tag:piecewise-linear] tag. This thread asks specifically about an R implementation. That said, continuity will still entail a jump at $100m^2$, just not in the response, but in the first derivative to the left vs. to the right of your threshold. I would much rather consider splines, potentially in interaction with other predictors. — Stephan Kolassa, Nov 27 '23 at 10:01

score 3 · Answer 1 · answered Nov 27 '23 at 09:38

Sure. Let $s$ be the feature that represents the square-footage of the house, and $v$ a feature vector containing all other features. Then I claim that every statistical model that has no jumps at $s=100$ (and consists of two linear submodels as you specify) must have the form

$$f(s,v) = \begin{cases} \alpha \cdot v + bs + c &\text{if } s<100\\ \alpha \cdot v + b's + c' &\text{if } s\ge 100, \end{cases}$$

where $\alpha,b,c,b'$ are arbitrary and $c' = c + 100(b-b')$.

In particular, the coefficients associated with every feature other than square-footage must be the same for both linear submodels.

Consequently, we can use optimization to solve for $\alpha,b,c,b'$ that minimize the total loss over the dataset. Here the loss for a data point $(s,v,y)$ is $(f(s,v)-y)^2$, and the total loss is the sum of the losses of each point in the dataset. The total loss is a quadratic function of $\alpha,b,c,b'$. Now minimize this loss using Newton's method or some other optimization algorithm to find the optimal values for $\alpha,b,c,b'$. If you need an initialization for $\alpha,b,c,b'$, start by using linear regression to fit a single linear model to the entire dataset, then copy the coefficients into both submodels so both submodels are initially identical, and let the optimizer improve from there.

Hopefully this also illustrates that if you want to insist that there are no jumps at $s=100$, using two linear submodels doesn't give you much additional expressiveness over a single linear model.

I suspect that the questioner also has different contributions of the other explanatory variables in the model for houses smaller and larger than 100sqm. This solution only seems to deal with a discontinuity in the size variable itself. — Christian Hennig, Nov 27 '23 at 11:47
@ChristianHennig, No, I don't believe that is a valid criticism. I think you have misunderstood my answer. I'm sure they do have different contributions in their current two submodels, but one can prove that if any of the coefficients other than for $s$ in the two linear submodels differ, then there will be jumps at $s=100$. So the requirement to not have jumps implies you cannot have two submodes with different contributions of the other explanatory variables. In other words, you can't have your cake (no jumps) and eat it too (different coefficients for other explanatory variables). — D.W., Nov 28 '23 at 04:48
This is fair enough and maybe the questioner asks for something impossible; it's still my impression that this is what they have in mind, but of course we can't know for sure if they don't tell us. — Christian Hennig, Nov 28 '23 at 11:17

BenP · Answer 2 · 2023-11-27T10:55:35.953

I was thinking of this way to estimate the two models as one. It's like D.W.'s suggestion, but specified differently.

Consider using:

y = b0 + b1*s1 + b2*s2 + b3*s100 + b4*v

where s1 is the square-footage and s2 is the square footage above 100. E.g. if the footage is 105 then s1=105 and s2=5. For footage lower than 100, s2=0. E.g. if footage=80, s1=80 and s2=0. Further, s100 is a dummy variable, with s100=0 for footage <= 100 and s100=1 for footage > 100. Finally, b0 is the intercept and v is whatever set of further predicting features you have.

If footage<=100 the equation gets:

y = b0 + b1*s1 + b2*0 + b3*0 + b4*v = b0 + b1*s1 + b4*v

If footage > 100 the equation gets:

y = b0 + b1*s1 + b2*s2 + b3*1 + b4*v = b0 + b1*s1 + b2*s2 + b3 + b4*v

So, b2 expresses the change in the slope of footage if footage is above 100.

Interesting may be: b3 expresses if, above 100, house price makes a jump upward. If you feel that such sudden jump does not exist (or if you do not want to model it), just leave s100 out of the equation.

This model can simply be estimated by least squares (or whatever criterion). It is called a "linear spline model", if you do NOT have s100 in the equation. See e.g. https://www.youtube.com/watch?app=desktop&v=EKDsH1uQing

I suspect that the questioner also has different contributions of the other explanatory variables in the model for houses smaller and larger than 100sqm. This solution only seems to deal with a discontinuity in the size variable itself. (One could probably introduce interaction terms to deal with this and treat them in the same way, but that may be too complex.) — Christian Hennig, Nov 27 '23 at 11:47

How to combine two linear models?

2 Answers2