From the page for GBR (here) in scikit learn, the score use R2 as evaluation metric. As far as I know, R2 is majorly for linear regression. Why is it used even in gradient boosting regressor, which should be a non-linear model itself?
Moreoever, I've used the model for modelling some fluctuation of price. Despite the resultant R2 can be negative, the MAPE is actually below 10%, and the graph shows pretty accurate prediction to the data. The latter two pieces of evidence actually suggest my model is working. I wonder is it just R2 is not really fitting for GBR use or there are some other reasons that give these contradictory results?
Thanks.
p.s. this can be considered as a sister post to this