0

I'm using SVM for clssification of remotely sensed data. I have training samples of 1000. I randomly divided the set into test and validation set and performed cross vlidation. Am I following the correct way of accuracy assessment or should I use ground truth data for test and validation set? But I have very less number of ground truth data of aroud 40 sample points. Could any one help me on this?

  • 1
    Welcome to CV! You may have more luck getting an answer if you include some detail on the data, and on how the "ground truth data" is gathered. It's not particularly clear to me what your problem or distinction is, and I suspect I'm not the only one. E.g., are you looking to predict remotely sensed data, or ground-truthed? – Sean Easter Dec 15 '15 at 15:10
  • I agree with @Sean Easter. Do define what you mean by truth data . Is your doubt how to correctly perform cross validation? Then this might help.. – Qwerty Dec 15 '15 at 16:01
  • Thank you both for your replies. Actually the ground truth was collected using GPS. I have around 40 ground truth points and I'm trying to predict the remotely sensed data. while performig cross validation I divided the training samples randomly as test set for prediction and validation set for tuning the SVM parameters. Here I did not use the ground truth as my test set.My query is am I performing cross validation correctly or should I use ground truth data for test set? – Shenbaga Rajan Dec 15 '15 at 16:14
  • 1
    @Shenbaga If you are the original poster of this question, then please visit http://stats.stackexchange.com/help/merging-accounts to merge your accounts: that will enable you to edit the question. – whuber Dec 15 '15 at 16:28

1 Answers1

1

"1000 training samples" to me sounds like: You have a dataset with 1000 data pairs. Pairs means for example: collocated measurements from Satellite a and Satellite b. You use them for the training of an SVM: This would basically establish a relation between Satellite a and b. You will then be able to predict Satellite a from any value of Satellite b.

If you use a part of those data pairs for validation, this validation will tell you how well the SVM performs to predict Satellite a!

If this is what you want: you are fine without ground data!

If you want to know how well the SVM describes "reality", you will have to consider also the error of Satellite a! This can theoretically, as you guessed, be achieved by using ground-based measurements.

However: do not forget that comparisons between Satellite-measurements and ground-based data are not that easy to handle and you have to consider many other error sources!

Staty
  • 33