0

Can someone provide their thoughts on a structured process one might go through to understand a collection of data. The scenario is: you've been given a set of data (features and observations - with descriptions) and been told to "tell me what kind of interesting things this data can tell me". I.e., what are interesting questions that this data can answer. The meaning of "interesting" is certainly subjective.

This appears to be classical unsupervised learning.

My initial thoughts:

  1. Cluster all pairs of variables to see interesting clusters
  2. Run PCA find high-variance groupings

Is there a general "how to understand a set of data" process that you've found successful?

Thanks

BobL
  • 31
  • 1
  • Google "Unsupervised learning". If it's time-series; wavelet power spectrum, visualization of correlation matrix, running window PCA and plot the eigenvalues to see how the global correlations change over time, big table of descriptive statistics (quantiles, moments, etc). – user2763361 Oct 30 '13 at 14:25

1 Answers1

1

John Tukey came up with an entire field devoted to this: Exploratory Data Analysis PCA is one part of this. Take a look and I'm sure youll find some good ideas.