I'm really new to ARIMA methods and am trying to forecast electricity load. I've integrated: electricity load, temperature, weekday (dummy), public holidays, and school holidays. My model tries to perform a non seasonal ARIMA with linear regression for each hour of the day.
Here is my code for an example of one of the 24 hours (6 AM):
# ElecLoad contains hourly loads and other data for 2005 and 2006 (=2*365*24 entries):
# 1. Electricity load in MW
# 2. Day of week: sunday=0, monday=1, etc
# 3. Hour of the day: 0 -> 23
# 4. Public Holiday: 1 if Public Holiday, 0 otherwise
# 5. School vacation: 1 if no scool
# 6. Temperature in °F
# Create the week matrix = dumy variables for the weekdays
weekmatrix<-model.matrix(~as.factor(ElecLoad[,2]))
#Remove intercept
weekmatrix<-weakmatrix[,-1]
#Generate FullTable
FullTable<-cbind(load=ElecLoad[,1], weekmatrix, ElecLoad[,4],
ElecLoad[,3],ElecLoad[,5],ElecLoad[,5]^2, ElecLoad[,6])
colnames(FullTable)<-c("Load","mon","tue","wed","thu","fri","sat",
"SchoolHol","PubHol","Temp","Temp2","Hour")
#Create the xreg = substed for a specific hour of the day (column 12 = Hour)
xreg<-subset(FullTable[,2:11], FullTable[,12] == 7)
#Create the Load time serie, also a subset of the full table
LoadTs<-ts(subset(FullTable[,1], FullTable[,12] == 7),start=1,frequency=1)
#Launch of auto.arima
ArimaLoad<-auto.arima(LoadTs, xreg=xreg, lambda=0)
When I try to forecast with the same 2 years data as xreg, here is my output
plot(forecast(ArimaLoad,xreg=xreg), include=0)

While when I try to plot the fitted it looks identical to my original Load
plot(fitted(ArimaLoad))

I don't understand why the prediction() is so much different than the fitted() with the same xreg matrix. Is this a normal behaviour, how can I improve my model to better fit with the real situation?
Thank you so much for your support.
I'm not sure I understood everything from what you propose.
You mean that I should build a first model to forecast the daily average load (I prefer the average than the sum because due to DST, some days don't have 24 hours...). This model would be deterministic, but I don't see what kind of model you're thinking off? Is a multilinear regression ok? I prefer to consider the log(load) to make the different parameters multiplicative which I think is better fit to the reality. Then I should have 24 hourly models, taking the daily average then split with a sort seasonal effect? Should I use somewhere an ARIMA model? I'm not convince of considering the month as having an effect, in my opinion there is no reason that consumption is more important in January than August except if we consider the Temperature and Holidays effects. The hour of the day is related to the activity that's the reason why I'm considering the specific model for each hour. The same way each day of the weak is different.
I've tried a multilinear regression for the same hour (7:00 AM) and the result looks not so bad.
#Create the frame.data
Load<-subset(FullTable[,1], FullTable[,12] == 7)
FullData<-cbind(LogLoad=log(Load), xreg)
FrameData<-data.frame(FullData)
# multilinear regression
mlin<-lm(LogLoad ~ mon+tue+wed+thu+fri+sat+SchoolHol+PubHol+Temp+`Temp2`, FrameData)
plot(exp(mlin$model$LogLoad), type="l",col="blue")
lines(exp(fitted(mlin)), col="red")

fitted() in red which is now exactly the same as predict() if I re-use the same data entry (2005-2006) and looks not so far from the original load in blue (no so bad for a simple model). I still don't fully understand why it did not work with ARIMA as it also takes into consideration multilinear regression.
Now my "simple" model already takes into account several parameters, like the temperature, the holiday, the school vacations the day of the week and the hour of the day (local time, not UCT). How can I improve my model further more? How can I make sure that the parameters are invariant? Is there a specific method?