3

I have a set of almost 1600 time series on 2 years which I want to group into clusters. Do you think this is possible using k-means? Which method do you advice me to use? Is this possible at all using SPSS?

chl
  • 53,725
Maria
  • 91

2 Answers2

2

k-means cannot use arbitrary distance functions. It is designed for Euclidean distance.

Euclidean distance however does not work well for high-dimensional data such as your time series (unless you have a really low sampling rate, say 24 months)

For time series, you will probably want to use a time series distance. There are quire a lot designed specifically for different kinds of time series. You really should look at these.

They won't work with k-means, but there are various distance and density-based cluster algorithms (where usually density is defined by distance!) that you should try. However, I have no idea what SPSS supports. I don't know if it has any time series distances, either.

  • Thankyou for your help @Anony-Mouse. I have 130 weeks of sample actually.. I'm bit scared with the size of my data, but let's see if that works. I'll follow your advice. Thanks! – Maria Oct 11 '12 at 10:58
  • Well, 1 measurement per week, or 1 measurement per second, that is what I'm trying to point out... – Has QUIT--Anony-Mousse Oct 11 '12 at 11:54
1

First of all, yes you can use k-means for cluster those time series. The default implementation of kmeans relies on the Euclidean distance, but can be modified to feed the algorithm with a specific time series distance, like DTW.

Check here for more information: On Clustering Multimedia Time Series Data Using K-Means and Dynamic Time Warping.

Second, i don't think you can use SPSS for those purposes, but i do know that you can use Matlab, there are plenty of implementations of kmeans and DTW avialable.

Isabel
  • 11