I have a data set looks like this:
[(time1, a, b),
(time2, c, d),
(time3, e, f),
(time4, g, h),
...]
Now I want to calculate some numbers for certain time intervals, i.e. calculate mean/cov of certain variables for a specified time intervals (1 min). So my question is: how to break data into batches/groups by the time difference in between?
For example, if the specified time interval is 30s, then I will check time1, and then all the way to a point where time - time1 >= 30s. All obs between time1 and this point would be in one group for further calculation.
for i, k in len(data):
breaks = []
if data[k] - data[i] <= time_interval:
k += 1
breaks[k] = i
As the pseudo code has shown above, I am stuck to figure out how to point the index in a way I can locate the beginning and ending point of each group. Right now, I can start with identifying the first group, but after that, I don't know how to increment the index to move to the beginning of the next group.
The output should include the beginning time and ending time of each group, and some other calculation based on the group-level data. Something like below:
[(group1_begin, group1_end, calculations ...),
(group2_begin, group2_end, calculations ...)...]
I thought of adding a new vector of as group number to achieve this, but could not figured out how, and I feel like it is not such a sufficient way in solving this problem. I wonder if this could be achieved by iteration.
Thanks.