2

I have large number of csv files and each of them are timeseries based csv files sampled at Avery 5 seconds for 2-3 mins. I have 20k such files with 200-300 variables in each file. I am aggregating the data by mean over the entire 2-3 mins window are using it for binary classification.

Currently I am using mean of each column in the .CSV file to represent that file, so basically I am summarizing the csv's using one scalar value per column. so each file is one sample represented by its respective mean value. Could anyone suggest me some better ways to summarize the timeseries data.

Thanks for your time.

aivanov
  • 1,510
  • 9
  • 14
Anurag Upadhyaya
  • 322
  • 1
  • 3
  • 14
  • Why do you have to summarise it by only one number? 2) what do you mean by „better“? What is your criterion?
  • – aivanov Dec 08 '17 at 22:10
  • I need to summarize each of he csv because I have huge number of such csv's. I am using mean to do such summarization hence, I wanted to know if there are other plausible ways to summarize time series data. Coz I believe mean is being too flat. – Anurag Upadhyaya Dec 09 '17 at 10:42
  • Your question isn’t answerable as it stands now. 1) It is still unclear what you mean by “better”. Do you have any quality criterion / procedure that could tell you that mean is worse than let say standard deviation, or just first measurement of your time series? You can’t optimise your “compression approach” without the optimisation criterion. 2) If you can’t formulate the criterion yourself, try to explain us how you intend to use the aggregated data (mean). Do you pass it to some ML algorithm? What kind of ML? – aivanov Dec 09 '17 at 11:05
  • Okay so yes , I am aggregating each file by mean and then I am training a binary classifier on it. So the data is pretty imbalanced and mean is not giving me variables which are separable. Both classes have similar distribution among all the variables. So as the mean was used to aggregate I was thinking of using some other way of aggregation as the data is time series may be mean is not the right way. – Anurag Upadhyaya Dec 09 '17 at 11:55