Hello anyone and everyone,
I have a data set of traffic flow data, particularly intensity data. I have the traffic counts per minute as the base data and then I am aggregating them into 3 and 5 minute interval. The question is, is it plausible to aggregate the data into overlapping intervals, i.e. calculate the sum of the past 3/5 minutes for every single minute? The reasons being increasing the data set and not missing any intermediate values. For example, a traffic breakdown may occur at any time with the past X minutes having contributed to that but if one aggregates without overlaps, one may lose the information about the X-minute-long interval immediately before the breakdown that caused it.
Obviously the overlapping aggragate intervals would be correlated but I dont think that should be too much of an issue for the purpose I need them with the 3 or 5 minute interval length (I, personally, wouldn't go longer that that, though). Of course, one would have to keep that in mind when interpreting the results and using them for further purposes.
Are my assumptions about the possibility to use this approach to maximize the data set valid? Could it cause any issues or is there anything I should worry about? I don't remember seeing such approach anywhere, but from my point of view (of newbie traffic engineer) it seems as a nice trick to enlarge the data set in certain cases or applications, like the above mantioned capacity estimation, were otherwise one could miss some important values as by the usual approach one only captures 1/3 or 1/5 of all the real combinations that happened in the real world.
Thank you for any possible feedback