0

I'm doing a multiclass classification and data is considered as not being a time-series. Working on a feature engineering and trying to solve the problem with classic KNN, RF, boosting etc. I'm creating new features based on rolling window and found out that people usually do mean aggregation.

  1. Does the mean characteristic of rolling distribution is the most informative?
  2. Are there any other characteristics (like std, quantilies, IQR, etc.) that may be used?
  3. Is it worth to perform some kind of data transformation (like scaling, quantile transfomation or Box-Cox)?
imitusov
  • 153
  • 1
  • 6
  • What are you planning on doing with this aggregation? The validity of your methodology depends on what you're trying to do at the end of the day. – Oliver Foster Oct 15 '20 at 22:19
  • @weareglenn i'm doing multiclass classification. Data is considered as not being a time-series (my implementation of time-series approaches got worse results, but validation scheme is still time-depended, i check metric only on future instances). So i working on feature engineering and trying to solve the problem with classic KNN, RF, boosting etc. The main purpose is to enlarge the feature space and probably find distribution properties of a rolling window that may improve metric. – imitusov Oct 16 '20 at 07:56

2 Answers2

1

Usually the point of a rolling window is to calculate and use a moving average:

A moving average is commonly used with time series data to smooth out short-term fluctuations and highlight longer-term trends or cycles.

By definition the only meaningful aggregation for this kind of purpose is the mean: the idea is to represent the same type of value as the original series while correcting noisy variations.

Anything else is not the standard use case for a rolling window, so the answer to questions 2 and 3 would completely depend on the application.

Erwan
  • 25,321
  • 3
  • 14
  • 35
1

As you've described in the comments: you are performing a multi-class classification problem after the rolling-window aggregations.

As this is the case - construct as many features as you can. Consider stuff like the stddev, the range of values, the change over the start and end of the window, etc... Your classifier will ultimately decide if these features are important and it behooves you to give it many options to choose from.

Personally I don't see the value of scaling as you have described in #3 - but it could be an avenue of future investigation after you've gone through the first pass of feature engineering & classification.

Having done this sort of problem before: what will really impact the performance of your model is the window size you choose for classification. Make sure to keep this as a variable so you can iterate through different window options & track performance.

Oliver Foster
  • 892
  • 5
  • 12