3

I'm using dimension reduction for data analysis (pca, tsne, umap...). Most examples I see project data in only 2 (or 3) dimensions, but I would naively imagine that by projecting in more dimension and visualize those dimensions 2 by 2 on multiple plots I could see more sub-characteristics of the data.

For example, if I have a dataset of cats and dogs data, and that after projecting it in 2dim I see mainly two cluster corresponding two the two species: is there a chance that if I project my data in 4 dimension and plot 2 graph (one with embeddings dimensions 1 and 2 and the other with dimensions 3 and 4) I would be able to see 2 clusters corresponding to cat and dogs on first graph and clusters corresponding to dog sub-species (like Labrador and bulldogs) on the second graph ?

So in other words, is it worth projecting my data in more than 3 dimensions with those kind of algorithms ?

Lynn
  • 1,707
ThomaS
  • 133
  • Welcome to Cross Validated! It’s an interesting idea, but if you’re going to play games to visualize high dimensions, why not do it for the data in their original dimension? – Dave Aug 08 '22 at 08:09
  • Because initial dimension seems too high, If I have like twenty dimensions I would need to plot every combination of dimension, while I reduce them to 4 or 5 it seems more doable – ThomaS Aug 08 '22 at 08:12

1 Answers1

3

Projecting to a low number of dimensions is mostly to help visualize the information for humans (as you suggest, there's absolutely no reason higher dimensions might not be better otherwise for many cases).

There are a lot of examples of representing things in higher embedding spaces, e.g.

Björn
  • 32,022
  • 1
    Methods such as incomplete principal components regression often find that the number of needed dimensions is in the range of 1/3 to 1/2 of the number of original dimensions. This may be hard to visualize but is still useful in a statistical model. – Frank Harrell Aug 08 '22 at 11:34