1

To give some context to the problem at hand, I have a (900,15000) (columns are way bigger than observations) data frame where the columns are roads on a map and the rows are the average speeds in Km/Hr of those roads at some timestamp

The data looks like this

                    | road1| road2| ... | road15000
2017-05-09 17:00:00 | 30.6 | 0.0  | ... | 40.4
2017-05-09 17:15:00 | 25.1 | 26.4 | ... | 50.3

and my objective is to draw them colored according to their total average speed, like thisenter image description here

my problem is rendering 15000 lines on a map makes it very unresponsive and takes some time (roughly 4-5 seconds on my machine) so I thought maybe I should use something to reduce my dimensions to be able to visualize only the statistically significant roads out of the whole roads set.

At first I thought of using SVD/PCA but it turned out that SVD won't help me much with my problem, as I understood that SVD and PCA do not select from the existing dimensions but they transform the data into new lower dimensions on this StackOverflow question (I asked that question).

So my question is if SVD can't help me, what other techniques should I use and why given that my data's columns are way larger than observations?

And also just out of curiosity by performing SVD how can I visualize the new dimensions on a map? a heat map maybe?

any help will be appreciated, thanks.

m.awad
  • 89
  • The thing is, to find out the importance of an feature, you need some target (labels, classes, etc) against which to compare the involvement or effect of the features, for e.g. like in a supervised algorithm. So if you have something like that, you can take a look at this scikit-learn user guide page. – Vivek Kumar Jun 20 '17 at 14:46
  • If you dont have any target or labels, then it will be slightly harder to find appropriate techniques for unsupervised problems. You may need to rethink it in another way. Like maybe you can focus on certain time stamps which have lower speeds for most roads (that means higher traffic), or maybe if you other information about the roads like the starting and end coordinates, then you may show only some roads at a time, which are along the same direction, etc. – Vivek Kumar Jun 20 '17 at 14:55
  • thank you @VivekKumar , your comments gave me some ideas about unsupervised feature selection, after a quick search I found this answer here that may (or not?) do the job. – m.awad Jun 20 '17 at 15:47

0 Answers0