In a comment to this question, user @whuber cited the possibility of using a periodic version of splines to fit periodic data. I would like to know more about this method, in particular the equations defining the splines, and how to implement them in practice (I'm mostly an R user, but I can make do with MATLAB or Python, if the need arises). Also, but this is a "nice to have", it would be great to know about possible advantages/disadvantages with respect to trigonometric polynomials fitting, which is how I usually deal with these kind of data (unless the response is not very smooth, in which case I switch to Gaussian Process with periodic kernel).
-
2check the answer of my another questions. http://stats.stackexchange.com/questions/225729/what-are-periodic-version-of-splines – Haitao Du Feb 03 '17 at 19:07
-
@hxd1011 thanks, I appreciate the tip. In the end I decided to just duplicate the data twice, thus having three consecutive sets of identical data, and fit the spline to the central third. The answer your refer to, also indicates this as an alternative solution. – DeltaIV Feb 03 '17 at 19:54
-
1@DeltaIV if you can convert your comment to an answer, and provide some more detail, I think it is a fine answer and a good question to have some resolution. – AdamO Dec 15 '17 at 18:12
-
@AdamO thanks for the suggestion, but during this time of the year I'm a bit swamped :-) I will try, though. I should first of all retrieve that code... – DeltaIV Dec 16 '17 at 15:22
2 Answers
Splines are used in regression modeling to model possibly complex, non-linear functional forms. A spline smoothed trend consists of piecewise continuous polynomials whose leading coefficient changes at each breakpoint or knot. The spline may be specified in terms of the polynomial degree of the trend as well as the breakpoints. A spline representation of a covariate extends a single vector of observed values into a matrix whose dimension is the polynomial degree plus the number of knots.
A periodic version of splines is merely a periodic version of any regression: the data are cut into replicates of the length of the period. So for instance, modeling a diurnal trend in a multiday experiment on rats would require recoding time of experiment into 24 hour increments, so the 154th hour would be the modulo 24 value of 10 (154 = 6*24 + 10). If you fit a linear regression on the cut data, it would estimate a saw-tooth waveform for the trend. If you fit a step function somewhere in the period, it would be a square waveform that fits the series. The spline is capable of expressing a much more sophisticated wavelet. For what it's worth, in the splines package, there is a function periodicSpline which does exactly this.
I don't find R's default spline "bs" implementation useful for interpretation. So I wrote my own script below. For a spline of degree $p$ with $n_k$ knots, this representation gives the first $p$ columns the standard polynomial representation, the $p+i$-th columns ($i \le n_k$) are simply evaluated as $S_{p+i} = (X - k_i)^p\mathcal{I}(X<k_i)$ where $k$ is the actual vector of knots.
myspline <- function(x, degree, knots) {
knots <- sort(knots)
val <- cbind(x, outer(x, knots, `-`))
val[val < 0] <- 0
val <- val^degree
if(degree > 1)
val <- cbind(outer(x, 1:{degree-1}, `^`), val)
colnames(val) <- c(
paste0('spline', 1:{degree-1}, '.1'),
paste0('spline', degree, '.', seq(length(knots)+1))
)
val
}
For a little case study, interpolate a sinusoidal trend on the domain of 0 to $2\pi$ (or $\tau$) like so:
x <- seq(0, 2*pi, by=pi/2^8)
y <- sin(x)
plot(x,y, type='l')
s <- myspline(x, 2, pi)
fit <- lm(y ~ s)
yhat <- predict(fit)
lines(x,yhat)
You'll see they're quite concordant. Further, the naming convention enables interpretation. In the regression output you see:
> summary(fit)
Call:
lm(formula = y ~ s)
Residuals:
Min 1Q Median 3Q Max
-0.04564 -0.02050 0.00000 0.02050 0.04564
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.033116 0.003978 -8.326 7.78e-16 ***
sspline1.1 1.268812 0.004456 284.721 < 2e-16 ***
sspline2.1 -0.400520 0.001031 -388.463 < 2e-16 ***
sspline2.2 0.801040 0.001931 414.878 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.02422 on 509 degrees of freedom
Multiple R-squared: 0.9988, Adjusted R-squared: 0.9988
F-statistic: 1.453e+05 on 3 and 509 DF, p-value: < 2.2e-16
The first set of covariates for my spline1.1-degree is the polynomial trend for the first domain behind the first breakpoint. The linear term is the slope of the tangent at the origin, X=0. This is nearly 1 which would be indicated by the derivative of the sinusoidal curve (cos(0) = 1), but we must bear in mind that these are approximations, and the error of extrapolating the quadratic trend out $\pi/2$ is prone to error. The quadratic term indicates a negative, concave shape. The spline2.2 term indicates a difference from the first quadratic slope, leading to a 0.4 positive leading coefficient indicating an upward, convex shape. So we now have interpretation available for spline output and can judge the inference and estimates accordingly.
I'm going to assume that you know the periodicity of the data at hand. If the data lack a growth or moving average component, you may transform a long time series into replicates of a short series of a duration of 1 period. You now have replicates and can use data analysis to estimate the recurrent trend.
Suppose I generate the following somewhat noisey, very long time series:
x <- seq(1, 100, by=0.01)
y <- sin(x) + rnorm(length(x), 0, 10)
xp <- x %% (2*pi)
s <- myspline(xp, degree=2, knots=pi)
lm(y ~ s)
The resulting output shows reasonable performance.
> summary(fit)
Call:
lm(formula = y ~ s)
Residuals:
Min 1Q Median 3Q Max
-39.585 -6.736 0.013 6.750 37.389
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.48266 0.38155 -1.265 0.205894
sspline1.1 1.52798 0.42237 3.618 0.000299 ***
sspline2.1 -0.44380 0.09725 -4.564 5.09e-06 ***
sspline2.2 0.76553 0.18198 4.207 2.61e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 9.949 on 9897 degrees of freedom
Multiple R-squared: 0.006406, Adjusted R-squared: 0.006105
F-statistic: 21.27 on 3 and 9897 DF, p-value: 9.959e-14
- 62,637
I was looking for an answer to this question recently and found the following solution, using the recent package splines2. There is a function to compute periodic m-splines (m-splines are normalized b-splines). Usage is very similar to the bs function. Let's say we have a 24-h noisy stationary signal, measured at fixed intervals over 2 days:
library(ggplot2)
library(splines2)
t <- seq(0, 48, length.out = 500)
y <- sin(time/2*pi/6) + rnorm(500, sd = 0.5)
df <- data.frame(t = t, y = y)
ggplot(df, aes(x = t, y = y)) + geom_point() + theme_minimal()
Now we can fit a periodic spline on this data and create predictions for our regular intervals:
# (boundary knots determine the period)
pspline_fit <- lm(y ~ mSpline(x = t,
df = 4,
periodic = TRUE,
Boundary.knots = c(0, 24)), data = df)
df <- cbind(df, as.data.frame(predict(pspline_fit, interval = "prediction")))
pred_plot <-
ggplot(df, aes(x = t, y = y)) +
geom_ribbon(aes(ymin = lwr, ymax = upr), alpha = 0.4) +
geom_line(aes(y = fit), size = 1, colour = "blue") +
geom_point() +
theme_minimal()
pred_plot
And what's nice about the periodic spline is that there is no discontinuity at the 24h mark, which you can visualise using polar coordinates:
pred_plot + xlim(0, 24) + coord_polar()
- 111
- 1
- 3


