Reconstruct a natural cubic spline from its fitted coefficients

Question

Consider a predictor $X$ that predicts a non-linear response $y$. A natural cubic spline with $K$ knots can be fit to $y$.

Several different linear basis expansions in $X$ can be used to fit the natural cubic spline, such as the following $K$ basis functions—from Hastie, Tibshirani & Friedman (2017):

\begin{align} h_1(X) &= 1\\ h_2(X) &= X\\ h_{k+2}(X) &= d_k(X) - d_{K-1}(X),\\ \end{align}

where

$$d_k(X) = \frac{(X-\xi_k)_+^3-(X-\xi_K)_+^3}{\xi_K-\xi_k},$$ $\xi_k \,$ is the $k$-th knot location, and $\, t_+ \,$ indicates that only the positive part is kept.

The natural cubic spline model can then be expressed as

$$f(X) = \sum_{k=1}^{K} \beta_{k} h_{k}(X),$$

and the $\beta$ coefficients can be estimated using least squares

$$\hat{\beta} = (\, h(X^T) \; h(X) \, )^{-1} \; h(X^T) \; y.$$

Question: How can one reconstruct the cubic spline in a single algebraic equation using only the fitted coefficients $\hat{\beta}$, the knot locations $\xi$, and the raw predictor $X$? I would appreciate a general expression for $N$ predictors.

Note: I am using the Python package statsmodels to do the natural cubic spline fitting (specifically the classes statsmodels.gam.smooth_basis.BSplines and statsmodels.gam.generalized_additive_model.GLMGam). I don't know know what basis functions are used in BSplines; I've included the functions above just as an example.

Here is a simple reproducible example where I try to fit a non-linear response:

import numpy as np
import BSplines from statsmodels.gam.smooth_basis
import GLMGam from statsmodels.gam.generalized_additive_model
import matplotlib.pyplot as plt
n_pts = 500
scaling factor for noise in data
noise_scl = 0.2
X = np.arange(n_pts)
create non-linear response
y = np.sin(np.arange(n_pts) / 150) * np.cos(np.arange(n_pts) / 75) + np.random.normal(
    loc=0, scale=noise_scl, size=n_pts
)
deg_free = 8
degree = 3
get the basis functions
bs = BSplines(
    X,
    df=deg_free,
    degree=degree,
    include_intercept=True,
)
get generalized additive model and apply basis functions
base_model = GLMGam(y, X, smoother=bs)
fit spline regression
res = base_model.fit()
this equation is wrong but is my best attempt
y_eq = (
    res.params[0]
    + res.params[1] * X
    + res.params[2] * np.max(X - bs.smoothers[0].knots[2], 0) ** 3
    + res.params[3] * np.max(X - bs.smoothers[0].knots[3], 0) ** 3
    + res.params[4] * np.max(X - bs.smoothers[0].knots[4], 0) ** 3
    + res.params[5] * np.max(X - bs.smoothers[0].knots[5], 0) ** 3
    + res.params[6] * np.max(X - bs.smoothers[0].knots[6], 0) ** 3
    + res.params[7] * np.max(X - bs.smoothers[0].knots[7], 0) ** 3
    + res.params[8] * np.max(X - bs.smoothers[0].knots[8], 0) ** 3
)
fig, ax = plt.subplots()
ax.scatter(X, y, color="blue")
ax.plot(X, res.fittedvalues, color="red", label="fittedvalues")
ax.plot(X, y_eq, color="orange", label="equation")
fig.legend()
plt.show()

Look at Section 2.4.5 of Frank Harrell's Regression Modeling Strategies. I think that uses the same basis functions and shows the general formula to express results in the form you want. The rcspline.restate() function in his Hmisc package implements that directly from a set of knot positions and coefficients, so you could double check against the code for that function. — EdM, Jun 06 '23 at 20:44
Thank you @EdM. Is the restatement for $K = 4$ as simple as $f(X) = \beta_1 + \beta_2 X + \beta_3 (X - \xi_2)+^3 + \beta_4 (X - \xi_3)+^3$ ? Does it depend on the basis functions used? — Florent H, Jun 06 '23 at 21:02
Perhaps https://stats.stackexchange.com/a/101484/919 answers your question? — whuber, Jun 06 '23 at 21:04
I only know Python and have elementary knowledge of R, so I'm having a little trouble interpreting your code. But it does look like your answer over there is relevant to my question. — Florent H, Jun 06 '23 at 21:10
Please look carefully at the page linked by @whuber, as his answer shows marvelously how to translate from the ns version of basis functions in the R splines package, and the page includes a brief answer from Frank Harrell about use of the truncated power basis that he prefers and is noted in my prior comment. If there's still something unclear, please edit the question to state the outstanding issues, as comments are easy to overlook and can be deleted. — EdM, Jun 06 '23 at 21:16
I've read the page several times, but I'm still lost. I think it's necessary for me to indicate what software I am using to perform the fit as the basis functions differ from software to software. I've added these details in the question. — Florent H, Jun 07 '23 at 04:03
B-Spline basis is different from truncated power splines. B-splines are usually defined recursively. The splines in statsmodels are based on patsy https://patsy.readthedocs.io/en/latest/spline-regression.html with additional boundary knots options, and corresponds to R mgcv package which has better documentation than the corresponding parts in statsmodels/patsy. — Josef, Jun 07 '23 at 15:48
Thank you for your input @Josef. I read the relevant sections in the mgcv documentation but did not find a mathematical formulation for the B-spline bases. Do you have a specific page number from the PDF or perhaps another source I can look at? Or do you know the basis formulation yourself? — Florent H, Jun 08 '23 at 03:54
It looks like I found a good source that describes the B-Spline bases! Page 5 of the PDF in A review of spline function procedures in R. Now I just need to understand if and how I can implement a one-line equation for the recursive bases. — Florent H, Jun 08 '23 at 04:15
After a lot of trial and error and checking various other sources, I found the right formulation after I noticed a glaring mistake in the article I shared above. When calculating the basis $B_{d,k}(x)$, the recursive formula should be an addition of the two terms, and NOT a subtraction! It's also important to define $0/0=0$ everywhere in your recursive function. Here's a better source, but the reader should beware that the degree $d$ is defined differently than in statsmodels ($d_{statsmodels} = d_{source} - 1$). — Florent H, Jun 08 '23 at 07:41
A good resource for numerical algorithms of this sort is Numerical Recipes. — whuber, Jun 08 '23 at 13:20

Reconstruct a natural cubic spline from its fitted coefficients

scaling factor for noise in data

create non-linear response

get the basis functions

get generalized additive model and apply basis functions

fit spline regression

this equation is wrong but is my best attempt

0 Answers0