3

Okay, here is the background: I am doing text mining, and my basic flow is like this: extract feature (n-gram), reduce feature count, score (tf-idf) and classify. for my own sake i am doing comparison between SVM and neural network classifiers. here is the weird part (or am i wrong and this is reasonable?), if i use 2gram the classifiers' result (accuracy/precision) is different and the SVM is the better one; but when i use 3-gram the results are exactly the same. what causes this? is there any explanation? is it the case of very separable classes?

  • Do the test set predictions for the two classifiers correspond exactly? i.e. is every instance classified in the same way? – Ansari Oct 05 '14 at 01:30
  • yes the methods are exactly like each other. the only difference is from the feature type – Nima Shayanfar Oct 05 '14 at 05:47
  • Not the methods, I meant the classified labels. Do they correspond exactly or is it just the aggregate percentages that match? – Ansari Oct 06 '14 at 04:06
  • Aaah. I misunderstood, sorry. but it makes no difference, they are the same – Nima Shayanfar Oct 06 '14 at 21:21
  • I asked a similar question not too long ago: http://datascience.stackexchange.com/questions/992/why-might-several-types-of-models-give-almost-identical-results – JenSCDC Oct 07 '14 at 01:49

1 Answers1

0

Your results are reasonable. Your data brings several ideas to mind:

1) It is quite reasonable that as you change the available features, this will change the relative performance of machine learning methods. This happens quite a lot. Which machine learning method performs best often depends on the features, so as you change the features the best method changes.

2) It is reasonable that in some cases, disparate models will reach the exact same results. This is most likely in the case where the number of data points is low enough or the data is separable enough that both models reach the exact same conclusions for all test points.

Max Gibiansky
  • 319
  • 1
  • 5