Periodic splines to fit periodic data

Question

In a comment to this question, user @whuber cited the possibility of using a periodic version of splines to fit periodic data. I would like to know more about this method, in particular the equations defining the splines, and how to implement them in practice (I'm mostly an R user, but I can make do with MATLAB or Python, if the need arises). Also, but this is a "nice to have", it would be great to know about possible advantages/disadvantages with respect to trigonometric polynomials fitting, which is how I usually deal with these kind of data (unless the response is not very smooth, in which case I switch to Gaussian Process with periodic kernel).

check the answer of my another questions. http://stats.stackexchange.com/questions/225729/what-are-periodic-version-of-splines — Haitao Du, Feb 03 '17 at 19:07
@hxd1011 thanks, I appreciate the tip. In the end I decided to just duplicate the data twice, thus having three consecutive sets of identical data, and fit the spline to the central third. The answer your refer to, also indicates this as an alternative solution. — DeltaIV, Feb 03 '17 at 19:54
@DeltaIV if you can convert your comment to an answer, and provide some more detail, I think it is a fine answer and a good question to have some resolution. — AdamO, Dec 15 '17 at 18:12
@AdamO thanks for the suggestion, but during this time of the year I'm a bit swamped :-) I will try, though. I should first of all retrieve that code... — DeltaIV, Dec 16 '17 at 15:22

AdamO · Accepted Answer · 2017-12-20T16:25:39.313

Splines are used in regression modeling to model possibly complex, non-linear functional forms. A spline smoothed trend consists of piecewise continuous polynomials whose leading coefficient changes at each breakpoint or knot. The spline may be specified in terms of the polynomial degree of the trend as well as the breakpoints. A spline representation of a covariate extends a single vector of observed values into a matrix whose dimension is the polynomial degree plus the number of knots.

A periodic version of splines is merely a periodic version of any regression: the data are cut into replicates of the length of the period. So for instance, modeling a diurnal trend in a multiday experiment on rats would require recoding time of experiment into 24 hour increments, so the 154th hour would be the modulo 24 value of 10 (154 = 6*24 + 10). If you fit a linear regression on the cut data, it would estimate a saw-tooth waveform for the trend. If you fit a step function somewhere in the period, it would be a square waveform that fits the series. The spline is capable of expressing a much more sophisticated wavelet. For what it's worth, in the splines package, there is a function periodicSpline which does exactly this.

I don't find R's default spline "bs" implementation useful for interpretation. So I wrote my own script below. For a spline of degree $p$ with $n_k$ knots, this representation gives the first $p$ columns the standard polynomial representation, the $p+i$-th columns ($i \le n_k$) are simply evaluated as $S_{p+i} = (X - k_i)^p\mathcal{I}(X<k_i)$ where $k$ is the actual vector of knots.

myspline <- function(x, degree, knots) {
  knots <- sort(knots)
  val <- cbind(x, outer(x, knots, `-`))
  val[val < 0] <- 0
  val <- val^degree
  if(degree > 1)
    val <- cbind(outer(x, 1:{degree-1}, `^`), val)
  colnames(val) <- c(
    paste0('spline', 1:{degree-1}, '.1'),
    paste0('spline', degree, '.', seq(length(knots)+1))
  )
  val
}

For a little case study, interpolate a sinusoidal trend on the domain of 0 to $2\pi$ (or $\tau$) like so:

x <- seq(0, 2*pi, by=pi/2^8)
y <- sin(x)
plot(x,y, type='l')
s <- myspline(x, 2, pi)
fit <- lm(y ~ s)
yhat <- predict(fit)
lines(x,yhat)

You'll see they're quite concordant. Further, the naming convention enables interpretation. In the regression output you see:

> summary(fit)

Call:
lm(formula = y ~ s)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.04564 -0.02050  0.00000  0.02050  0.04564 

Coefficients:
             Estimate Std. Error  t value Pr(>|t|)    
(Intercept) -0.033116   0.003978   -8.326 7.78e-16 ***
sspline1.1   1.268812   0.004456  284.721  < 2e-16 ***
sspline2.1  -0.400520   0.001031 -388.463  < 2e-16 ***
sspline2.2   0.801040   0.001931  414.878  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.02422 on 509 degrees of freedom
Multiple R-squared:  0.9988,    Adjusted R-squared:  0.9988 
F-statistic: 1.453e+05 on 3 and 509 DF,  p-value: < 2.2e-16

The first set of covariates for my spline1.1-degree is the polynomial trend for the first domain behind the first breakpoint. The linear term is the slope of the tangent at the origin, X=0. This is nearly 1 which would be indicated by the derivative of the sinusoidal curve (cos(0) = 1), but we must bear in mind that these are approximations, and the error of extrapolating the quadratic trend out $\pi/2$ is prone to error. The quadratic term indicates a negative, concave shape. The spline2.2 term indicates a difference from the first quadratic slope, leading to a 0.4 positive leading coefficient indicating an upward, convex shape. So we now have interpretation available for spline output and can judge the inference and estimates accordingly.

I'm going to assume that you know the periodicity of the data at hand. If the data lack a growth or moving average component, you may transform a long time series into replicates of a short series of a duration of 1 period. You now have replicates and can use data analysis to estimate the recurrent trend.

Suppose I generate the following somewhat noisey, very long time series:

x <- seq(1, 100, by=0.01)
y <- sin(x) + rnorm(length(x), 0, 10)
xp <- x %% (2*pi)
s <- myspline(xp, degree=2, knots=pi)
lm(y ~ s)

The resulting output shows reasonable performance.

> summary(fit)

Call:
lm(formula = y ~ s)

Residuals:
    Min      1Q  Median      3Q     Max 
-39.585  -6.736   0.013   6.750  37.389 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.48266    0.38155  -1.265 0.205894    
sspline1.1   1.52798    0.42237   3.618 0.000299 ***
sspline2.1  -0.44380    0.09725  -4.564 5.09e-06 ***
sspline2.2   0.76553    0.18198   4.207 2.61e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 9.949 on 9897 degrees of freedom
Multiple R-squared:  0.006406,  Adjusted R-squared:  0.006105 
F-statistic: 21.27 on 3 and 9897 DF,  p-value: 9.959e-14

Erik-Jan · Answer 2 · 2021-02-03T13:33:30.157

I was looking for an answer to this question recently and found the following solution, using the recent package splines2. There is a function to compute periodic m-splines (m-splines are normalized b-splines). Usage is very similar to the bs function. Let's say we have a 24-h noisy stationary signal, measured at fixed intervals over 2 days:

library(ggplot2)
library(splines2)
t <- seq(0, 48, length.out = 500)
y <- sin(time/2*pi/6) + rnorm(500, sd = 0.5)
df <- data.frame(t = t, y = y)
ggplot(df, aes(x = t, y = y)) + geom_point() + theme_minimal()

Now we can fit a periodic spline on this data and create predictions for our regular intervals:

# (boundary knots determine the period)
pspline_fit <- lm(y ~ mSpline(x = t, 
                              df = 4, 
                              periodic = TRUE, 
                              Boundary.knots = c(0, 24)), data = df)
df <- cbind(df, as.data.frame(predict(pspline_fit, interval = "prediction")))
pred_plot <- 
  ggplot(df, aes(x = t, y = y)) + 
  geom_ribbon(aes(ymin = lwr, ymax = upr), alpha = 0.4) + 
  geom_line(aes(y = fit), size = 1, colour = "blue") + 
  geom_point() + 
  theme_minimal()
pred_plot

And what's nice about the periodic spline is that there is no discontinuity at the 24h mark, which you can visualise using polar coordinates:

pred_plot + xlim(0, 24) + coord_polar()

Periodic splines to fit periodic data

2 Answers2

Linked