1

I have a DateTime Index in my DataFrame with multiple columns. As shown:

                     data_1   data_2
time                                      
2020-01-01 00:23:40  330.98      NaN
2020-01-01 00:23:50  734.52      NaN
2020-01-03 00:00:00  388.06     23.9
2020-01-03 00:00:10  341.60     25.1
2020-01-03 00:00:20  395.14     24.9
...
2020-01-03 00:01:10  341.60     25.1
2020-01-03 00:01:20  395.14     24.9

I want to apply a function on rolling window (It has to be datetime, as i may have missed data, and this one is not my case) and collect some features. Features depend on multiple columns. I wrote my own class:

class FeatureCollector:
    def __init__(self):
        self.feature_dicts = []

    def collect(self, window):
        self.feature_dicts.append(extract_features(window))
        return 1

def extract_features(window):
    ans = {}
    # do_smth_on_window and calculate ans
    return ans

I run my roll as follows

collector = FeatureCollector()
my_df.rolling(timed(seconds=100), min_periods=10).apply(collector.collect)
features = collector.feature_dicts

But the problem is that extract_features may get only Series object, as I understood. My columns data_1 and data_2 will be passed there in turn as it is such a DataFrame:

                       data
time                                      
2020-01-01 00:23:40  330.98
2020-01-01 00:23:50  734.52
2020-01-03 00:00:00  388.06
2020-01-03 00:00:10  341.60
2020-01-03 00:00:20  395.14
...
2020-01-03 00:01:10  341.60
2020-01-03 00:01:20  395.14                                 
2020-01-01 00:23:40     NaN
2020-01-01 00:23:50     NaN
2020-01-03 00:00:00    23.9
2020-01-03 00:00:10    25.1
2020-01-03 00:00:20    24.9
...
2020-01-03 00:01:10    25.1
2020-01-03 00:01:20    24.9

How can I organize it in such a way that one window passed to extract_features would be a DataFrame with two columns?

0 Answers0