2

Hello anyone and everyone,

I have a data set of traffic flow data, particularly intensity data. I have the traffic counts per minute as the base data and then I am aggregating them into 3 and 5 minute interval. The question is, is it plausible to aggregate the data into overlapping intervals, i.e. calculate the sum of the past 3/5 minutes for every single minute? The reasons being increasing the data set and not missing any intermediate values. For example, a traffic breakdown may occur at any time with the past X minutes having contributed to that but if one aggregates without overlaps, one may lose the information about the X-minute-long interval immediately before the breakdown that caused it.

Obviously the overlapping aggragate intervals would be correlated but I dont think that should be too much of an issue for the purpose I need them with the 3 or 5 minute interval length (I, personally, wouldn't go longer that that, though). Of course, one would have to keep that in mind when interpreting the results and using them for further purposes.

Are my assumptions about the possibility to use this approach to maximize the data set valid? Could it cause any issues or is there anything I should worry about? I don't remember seeing such approach anywhere, but from my point of view (of newbie traffic engineer) it seems as a nice trick to enlarge the data set in certain cases or applications, like the above mantioned capacity estimation, were otherwise one could miss some important values as by the usual approach one only captures 1/3 or 1/5 of all the real combinations that happened in the real world.

Thank you for any possible feedback

Igor M.
  • 21

1 Answers1

0

I think that this article is interesting in relation to your problem. It deals with long-term return predictability. Due to the nature of this problem, the number of non-overlapping 5, 10, 20 year periods are very limited, which have made researchers use overlapping returns. However, this has severe consequences for inference and the actual number of additional observations one gets are very small.

  • Johan, thank you for the article. It has been an interesting read and surely gave me some information on the topic. However, it seems to me that I have misused the term "time series" as it makes people think that I am doing some regression and predictions with the data. Sadly, there is, as far as I could find, no way to modify the title or the question itself to clarify it. In fact, I am basically interested in distribution of the data, where it seems to me that it should be ok to use overlapping data. If I undestood the paper correctly, the main issue is the correlation of the dependant var. – Igor M. Jul 25 '18 at 10:32