Is there any way to test whether a series should be logged or transformed in another way?
I have a code of which i use to run lots of different data through to forecast. Some of the data definitely need transforming however some don't. As the code has been written to be fully automatic it will be used by non-statisticians within the company so they will have no idea whether they should change the code to transform the data depending on the series. So i need tests which will check that for them and apply the transformation accordingly.
Here is a example data set that you can use:
M <- matrix(c("08Q1", "08Q2", "08Q3", "08Q4", "09Q1", "09Q2", "09Q3", "09Q4", "10Q1", "10Q2", "10Q3", "10Q4", "11Q1", "11Q2", "11Q3", "11Q4", "12Q1", "12Q2", "12Q3", "12Q4", "13Q1", "13Q2", "13Q3", "13Q4", "14Q1", "14Q2", "14Q3", 5403.676, 6773.505, 7231.117, 7835.552, 5236.710, 5526.619, 6555.782, 11464.727, 7210.069, 7501.610, 8670.903, 10872.935, 8209.023, 8153.393, 10196.448, 13244.502, 8356.733, 10188.442, 10601.322, 12617.821, 11786.526, 10044.987, 11006.005, 15101.946, 10992.273, 11421.189, 10731.312),ncol=2,byrow=FALSE)
Nu <- M[, length(M[1,])]
I have found boxcoxfit() from the package geoR finds the lambda for transformation....does anyone know how accurate this is for transforming the data?
ml <- boxcoxfit(Nu)
Fitted parameters:
lambda beta sigmasq
0.59 375.43 3649.39
N<- ((Nu^(ml$lambda))-1)/ml$lambda
:some 27 quarterly observations starting at 2008 q1
and shown here
which yielded an ACF of the error process suggesting model sufficiency
. The model includes an identified intervention at period 21 (2013 quarter 1 ) of the 27 observations. A plot of the actual and the cleansed highlights the anomaly.
The actual/fit/forecast graph is here
with forecasts here
. In summary there was no need for any variance stabilization transformation for this data set. The optimal box-cox coefficient requires a model and in this case is 1.0. If you don't specify a model as is possible with boxcoxfit then in the absence of a good ARIMA structure and the identified anomaly at period 21 you might then get a lambda like .52 which is probably the result of an incorrect model.