I have three statistical models, partial least square(PLS), random forest (RF) and support vector machine (SMV). I have divided my three datasets (e.g. leaf, canopy, shrub) randomly 1000 times into calibration and validation set and ran the models and calculate the error (a vector of 1000RMSEs). The result is shown in the boxplot below.As you can see from the figure, for example, for the canopy dataset SMV has a large variance in predictions (RMSEs are way off from the mean). My questions are: 1- can we conclude from the figure that the variance of SVM model is higher or in another word the model is less consistent in predictions (let say comparing with RF)? 2- Is boxplot a good idea to show the model variance? and How do you interpret the graph?

Asked
Active
Viewed 244 times
1
Ress
- 417
-
Boxplot works for showing data distribution for any variable of interest, including variance, but have you considered showing how much the mean itself varies within each run, i.e. how much of the variability can be due to the "random" part of RF itself? – katya Apr 28 '17 at 01:37
-
Thanks for your response, do you mean to run a test such as ANOVA on the mean of the three models and three categories (i.e. scale) and look at the significant difference? – Ress May 01 '17 at 16:59