2

I have two dimensional data, to which I'd like to fit a spline function. A spline would be defined as this:

  • y_0 for x<=x_0
  • y_0 + a * (x-x_0) for x_0<=x<=x_1
  • y_1 = y_0 + a * (x_1-x_0) for x>=x_1

So, constant line -> linear function -> constant line.

I want to find out values of x_0, x_1, y_0 and a minimising RMSE (or otherwise fitting well). How can I do this efficiently in Python?

whuber
  • 322,774
Jean Broc
  • 33
  • 3
  • 2
    Because this is a variation of the question at https://stats.stackexchange.com/questions/291436, you will likely find at least one of the many answers there to be helpful. The principal complication concerns estimating $x_0,$ which does not appear within the model in a linear fashion: this requires a nonlinear fitting method, such as profile likelihood (or its least squares equivalent). As you might surmise from the referenced thread, creating a fully general and correct program for this changepoint problem is not a small matter: it would be best to find existing Python code. – whuber Dec 30 '22 at 14:32
  • 1
    @whuber that's true in general, but in this case where there are only two changepoints to estimate we may be in luck if the OP's data is not overwhelmingly large: we can simply exhaustively try a fine grid of values $\underset{j}{\min}(\mathbf{x}_j)\leq x_0 \leq x_1 \leq \underset{j}{\max}(\mathbf{x}_j)$, implemented via a nested for loop, taking advantage of profile least squares as you advocate. Whether or not this qualifies as "efficient" as the OP requests depends; I simply note that this approach is embarrassingly parallel. – John Madden Dec 30 '22 at 16:49
  • 2
    @John That would be easy to implement and parallel, as you say. In terms of total computation, deploying a nonlinear optimizer would likely be the most efficient. – whuber Dec 30 '22 at 18:41

0 Answers0