Xgboost 算法全面深入理解 ——通用 Tree Boosting 算法

/ Machine Learning / 没有评论 / 417浏览

有一系列的决策树 $T =(t_1, t_2,..., T_t)$, 每一颗决策树的叶子节点有一个权值 $w_j$, 那么最终的结果可以表示为

$$\hat{y_i} = \sum_j w_jx_{ij}$$

$x_{ij}$ 表示第 i 颗树的第 j 个叶子节点的值

目标函数

$$l(y_i, \hat{y_i}) = (y_i - \hat{y_i})^2$$

最优解函数

$$F^*(\vec{x}) = \arg \min E_{(x,y)} [L(y_j, F(\vec{x})]$$

集成算法表示

$$\hat{y_i} = \sum_{k=1}^{K}f_{k}(x_i), \qquad f_{k} \in F$$

第0轮 $\hat{y_i}^{(0)} = 0$

第1轮 $\hat{y_i}^{(1)} = f_1(x_i) = \hat{y_i}^{(0)} + f_1(x_i)$

第2轮 $\hat{y_i}^{(2)} = f_1(x_i) + f_2(x_i) = \hat{y_i}^{(1)} + f_2(x_i)$

第t轮 $\hat{y_i}^{(t)} = \sum_{k=1}^{t}f_k(x_i) = \hat{y_i}^{(t-1)} + f_t(x_i)$

惩罚项

$$\Omega(f_t) = \gamma T + \frac{1}{2}\lambda \sum_{j=1}^{T} \omega_{j}^{2}$$

T 是第 t 课树的叶子节点个数, $\gamma$ 是系数,

目标函数

$$ \begin{aligned} Obj^{(t)} &= \sum_{i=1}^{n}l(y_i, \hat{y_i}) + \sum_{i=1}^{t}\Omega(f_i) \\ &=\sum_{i=1}^{n} l\left [y_i,\hat{y_i}^{(t-1)} + f_t(x_i) \right ] + \Omega(f_t) + C \\ & \approx \sum_{i=1}^{n} \left [ l(y_i, \hat{y_i}^{(t-1)}) + l^{'}f_t(x_i) + \frac{1}{2}l^{''}f_t^2(x_i) \right ] + \Omega(f_t) + C \\ &= \sum_{i=1}^{n} \left [l^{'}f_t(x_i) + \frac{1}{2}l^{''}f_t^2(x_i) \right ] + \Omega(f_t) + C_1 \\ &= \sum_{i=1}^{n} \left [l^{'}f_t(x_i) + \frac{1}{2}l^{''}f_t^2(x_i) \right ] + \gamma T + \frac{1}{2}\lambda \sum_{j=1}^{T} \omega_{j}^{2} + C_1 \\ &=\sum_{j=1}^{T} \left [ (\sum_{i \in I_j}l^{'})w_j + \frac{1}{2}(\sum_{i \in I_j}l^{''} + \lambda)w_j^2 \right ] + \gamma T + C_1 \\ &=\sum_{j=1}^{T} \left [ G_j w_j + \frac{1}{2}(H_j + \lambda)w_j^2 \right ] + \gamma T + C_1 \\ \end{aligned} $$

目标函数最小化

$$\frac{\partial J(f_t)}{\partial w_j} = G_j + (H_j + \lambda) w_j = 0$$

$$w_j = - \frac{G_j}{H_j + \lambda}$$

则原目标函数为:

$$Obj = -\frac{1}{2} \sum_{j=1}^{T} \frac{G_{j}^{2}}{H_j+\lambda} + \gamma T$$

信息增益

$$Gain = \frac{1}{2} \left [\frac{G_L^2}{H_L + \lambda} + \frac{G_R^2}{H_R+\lambda} -\frac{(G_L + G_R)^2}{H_L + H_R + \lambda} \right ] - \gamma$$

未完待续