What is a reasonable process to understand a collection of data?

Question

Can someone provide their thoughts on a structured process one might go through to understand a collection of data. The scenario is: you've been given a set of data (features and observations - with descriptions) and been told to "tell me what kind of interesting things this data can tell me". I.e., what are interesting questions that this data can answer. The meaning of "interesting" is certainly subjective.

This appears to be classical unsupervised learning.

My initial thoughts:

Cluster all pairs of variables to see interesting clusters
Run PCA find high-variance groupings

Is there a general "how to understand a set of data" process that you've found successful?

Thanks

Google "Unsupervised learning". If it's time-series; wavelet power spectrum, visualization of correlation matrix, running window PCA and plot the eigenvalues to see how the global correlations change over time, big table of descriptive statistics (quantiles, moments, etc). — user2763361, Oct 30 '13 at 14:25

score 1 · Answer 1 · answered Oct 28 '13 at 20:13

1

John Tukey came up with an entire field devoted to this: Exploratory Data Analysis PCA is one part of this. Take a look and I'm sure youll find some good ideas.

answered Oct 28 '13 at 20:13

What is a reasonable process to understand a collection of data?

1 Answers1