I have a problem which consists of classifying N-dimensional histograms. The salient points are as follows:
For ALL dimensions:
- Each histogram has the same number of bins (say 500, for argument sake)
- Each bin has the same range
A high level view of my (lay man's) approach to the problem would be to do the following:
- Calculate points P in 500 D space using Pythagoras theorem (pairwise combination)
- Calculating the Mahalanobis distance of each point P, and using that to categorize the points.
I'm not a practising statistician, but this seems to be an intuitive way to solve the problem.
Am I missing anything fundamental here (or are there any assumptions I am making that I may be unaware of)?
I would prefer however, to code the solution by hand (well, using python, pandas etc), so that at least, I understand what is going on - as I don't want to use formulae/models I don't really understand - intuitively. HTH
– Homunculus Reticulli Oct 10 '15 at 06:58