When do we choose gblinear boosting or gbtree boosting in xgboost library. I have a meteorological rain data with lots of missing values.
1 Answers
If we think that we should be using a gradient boosting implementation like XGBoost, the answer on when to use gblinear instead of gbtree is: "probably never". With gblinear we will get an elastic-net fit equivalent and essentially create a single linear regularised model. Unless we are dealing with a task we would expect/know that a LASSO/ridge/elastic-net regression is already competitive, it does not worth our trouble aside maybe if we have already in place a data pipeline serving an XGBoost model and we want to try a GLM quickly. That said though, R, Python, MATLAB, Julia, etc. have better and more well-developed specialised routines to fit elastic-net regression tasks.The CV.SE thread: Difference in regression coefficients of sklearn's LinearRegression and XGBRegressor provides further details on comparing XGBoost's gblinear to a standard linear regression.
Note if we believe some linear relation to be present at a low/local level, LightGBM's argument lineartree is emulating the methodology of Cubist (or M5) where a tree is grown such that the terminal leaves contain linear regression models. This is structurally different to gblinear though as it is first and foremost a tree rather than a regularised linear model.
- 44,125