3

I have a trend stationary data series that does not have a unit root.

The data are hourly with about five years of data.

I have controlled for the apparent trend in the data using a series of binary variables for each year exclusive of one year to avoid singularity.

A reviewer has essentially suggested that I replace the binary variables with a yearly trend term. Question: Won't this give rise to a unit root issue given that a linear trend term has a unit root?

Roger V.
  • 3,903
  • 4
    To be honest, I would be more concerned with the fact that dummy indicators per year model that your response changes drastically every January 1st, then stays on a constant level before changing again drastically the next January 1st, with successive step changes completely unrelated to each other. This makes sense in a regulatory framework (e.g., the tax code changing at the beginning of the year), but most other things change much more smoothly. Compare this. So I would agree with your reviewer. Perhaps look at a local smoother, too. – Stephan Kolassa Feb 08 '23 at 13:44

1 Answers1

6

A linear trend term has no unit root.

A unit root arises in models like $y_t=y_{t-1}+\epsilon_t$, where the characteristic polynomial $1-z=0$ has solution 1. A linear trend model correspondings to something like $$ y_t=\delta t+\epsilon_t $$ In any case, unit roots or not are a property of the underlying process, not your fitting procedure. If your process had a unit root, both fitting a linear trend model as well as dummies would not be the recommended way to go, but rather differencing the series.

I also agree with @Stephan's comment, as what you do seems to amount to what I sketch below - the yearwise mean would (given a positive trend) overstate things in the beginning of the year and understate towards the end of the year.

enter image description here

n <- 24*365*5

delta <- .002

y <- delta*(1:n) + rnorm(n, sd=2) plot(1:n,y, type="l", lwd=.01)

abline(v=(1:4)24365, lty=2) year <- rep(1:5,each=24365) doyend <- 1:524*365 doystart <- c(1, doyend[1:4]+1) means <- sapply(1:5, function(i) mean(y[year==i])) segments(doystart, means, doyend, means, lty=1, lwd=4, col="red")

> means [1] 8.783945 26.244264 43.800528 61.259110 78.873887 > summary(lm(y~factor(year))) # regression-based

Call: lm(formula = y ~ factor(year))

Residuals: Min 1Q Median 3Q Max -15.9170 -4.4296 0.0005 4.3930 14.8603

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.78395 0.05804 151.3 <2e-16 *** factor(year)2 17.46032 0.08209 212.7 <2e-16 *** factor(year)3 35.01658 0.08209 426.6 <2e-16 *** factor(year)4 52.47516 0.08209 639.3 <2e-16 *** factor(year)5 70.08994 0.08209 853.9 <2e-16 ***