I am learning XGBoost from documentation, but there are a few questions in the derivation of it.
In the part of Additive Training of Tree Boosting, they say we take the Taylor expansion of the loss function up to the second order in general case, but I get some questions in derivation from:
$\text{obj}^{(t)} = \sum_{i=1}^n l(y_i, \hat{y}_i^{(t-1)} + f_t(x_i)) + \Omega(f_t) + \mathrm{constant}$
to:
$\text{obj}^{(t)} = \sum_{i=1}^n [l(y_i, \hat{y}_i^{(t-1)}) + g_i f_t(x_i) + \frac{1}{2} h_i f_t^2(x_i)] + \Omega(f_t) + \mathrm{constant}$
where the $g_i$ and $h_i$ are defined as
$\begin{split}g_i &= \partial_{\hat{y}_i^{(t-1)}} l(y_i, \hat{y}_i^{(t-1)})\\ h_i &= \partial_{\hat{y}_i^{(t-1)}}^2 l(y_i, \hat{y}_i^{(t-1)})\end{split}$
I mean I know how to make the Taylor series expanded to second order:
$f(x) = f(x_k) + (x - x_k)f^{'}(x^{k}) + \frac{1}{2!}(x-x_k)^2f^{''}(x_k) + o^n$
And I assume $f(x) = l(y_i, x)$, $x = \hat{y}^{(t-1)} + f_t(x_i)$ and $x_k = \hat{y}^{(t-1)}$, then use $\partial\hat{y}^{(t-1)}$ in the Talyor series, so we get the right result as mentioned above.
But I don't know that is a right derivation or not and even if it is right, I still feel it is hard to understand why they choose to expand it in this way.
I would appreciate it if anyone could help me.