GBDT/Loss

流程

[[Gaussian Distribution]]

  • 损失函数 (yif(xi))2{(y_i - f(x_i))^2}

  • 负梯度 yif(xi){y_i - f(x_i)}

  • 初始化 yim{\frac {\sum y_i}{m}}

  • 叶节点估计

[[AdaBoost]]

  • 损失函数 e(2y1)f(x),y{1,0}{e^{-(2y-1)f(x)}, y\in \{1,0\}}

    • 也可以是 eyf(x),y{1,1}{e^{-yf(x)}, y\in \{1,-1\}}
  • 负梯度 (2y1)e(2y1)f(x){(2y-1)e^{-(2y-1)f(x)}}

  • g=(2y1)e(2y1)f(x){g=-(2y-1)e^{-(2y-1)f(x)}}, h=(2y1)2e(2y1)f(x)=e(2y1)f(x){h=(2y-1)^2e^{-(2y-1)f(x)}=e^{-(2y-1)f(x)}}

  • 初始化 F0=12logP(y=1x)P(y=1x){F_0=\frac{1}{2} \log \frac{\sum P(y=1|x)}{\sum P(y=-1|x)}}

  • 叶节点估计 gh=(2y1)e(2y1)f(x)e(2y1)f(x)=2y1{-\frac{g}{h}=\frac{(2y-1)e^{-(2y-1)f(x)}}{e^{-(2y-1)f(x)}}=2y-1}

[[Bernoulli Distribution]]

  • gbdt 代码中 y{0,1}{y\in \{0,1\}}

    • logistics Regression 与对数损失函数转化

    • y=12,y{0,1}{y^*=\frac{1}{2}, y^*\in \{0,1\}}

    • eyF(x)+eyF(x)=eF(x)+eF(x),y{1,1}{e^{yF(x)}+e^{-yF(x)}=e^{F(x)}+e^{-F(x)}, y\in \{-1,1\}}

    • ylogp(x)+(1y)log(1p(x))=1+y2logeF(x)eF(x)+eF(x)+1y2logeF(x)eF(x)+eF(x){y^*\log p(x)+(1-y^*) \log (1-p(x))=\frac{1+y}{2}\log \frac{e^{F(x)}}{e^{F(x)}+e^{-F(x)}}+\frac{1-y}{2}\log \frac{e^{-F(x)}}{e^{F(x)}+e^{-F(x)}}}

    • =1+y2logeF(x)+1y2logeF(x)+1+y2log1eF(x)+eF(x)+1y2log1eF(x)+eF(x)=yF(x)+log1eF(x)+eF(x){=\frac{1+y}{2}\log e^{F(x)}+\frac{1-y}{2}\log e^{-F(x)} +\frac{1+y}{2}\log \frac{1}{e^{F(x)}+e^{-F(x)}}+\frac{1-y}{2}\log \frac{1}{e^{F(x)}+e^{-F(x)}}=yF(x)+\log \frac{1}{e^{F(x)}+e^{-F(x)}}}

    • =logeyF(x)eyF(x)+eyF(x)=log(1+e2yF(x)){=\log \frac{e^{yF(x)}}{e^{yF(x)}+e^{-yF(x)}}=\log(1+e^{-2yF(x)})}

  • 对数损失函数 log(1+e2yf(x)),y{1,1},F(x)=12logP(y=1x)P(y=1x){\log(1+e^{-2yf(x)})}, y\in \{-1,1\}, F(x)=\frac{1}{2} \log \frac{P(y=1|x)}{P(y=-1|x)}

  • 负梯度 2y1+e2yf(x){\frac{2y}{1+e^{2yf(x)}}}

  • 初始化 F0=logP(y=1x)P(y=1x){F_0= \log \frac{\sum P(y=1|x)}{\sum P(y=-1|x)}}

  • 叶节点估计 xiRjmyi~xiRjmyi~(2yi~){\frac{\sum_{x_i \in R_{jm}}\tilde{y_i}}{\sum_{x_i \in R_{jm}}|\tilde{y_i}|(2-|\tilde{y_i}|)}}

  • single Newton-Raphson

    • log(1+e2yf(x)){log(1+e^{-2yf(x)})}

    • 一阶导数 g=2yi1+e2yiFm1(x){g=-\frac{2y_i}{1+e^{2y_iF_{m-1}(x)}}}

    • 二阶导数 h=4yi2e2yiFm1(x)(1+e2yiFm1(x))2{h=\frac{4y_i^2e^{2y_iF_{m-1}(x)}}{(1+e^{2y_iF_{m-1}(x)})^2}}

    • θ=gh=yi~yi~(2yi~){\theta = -\frac{g}{h}=\frac{\tilde{y_i}}{|\tilde{y_i}|(2-|\tilde{y_i}|)}}

    • yi~(2yi~)=2yi1+e2yiFm1(x)(22yi1+e2yiFm1(x))=2yi(21+e2yiFm1(x)2yi)(1+e2yiFm1(x))2=4yi+4yie2yiFm1(x)4yi2(1+e2yiFm1(x))2=4yi2e2yiFm1(x)(1+e2yiFm1(x))2{|\tilde{y_i}|(2-|\tilde{y_i}|)=|\frac{2y_i}{1+e^{2y_iF_{m-1}(x)}}|(2-|\frac{2y_i}{1+e^{2y_iF_{m-1}(x)}}|)=\frac{|2y_i|(2|1+e^{2y_iF_{m-1}(x)}|-|2y_i|)}{(1+e^{2y_iF_{m-1}(x)})^2}=\frac{|4y_i+4y_ie^{2y_iF_{m-1}(x)}|-4y_i^2}{(1+e^{2y_iF_{m-1}(x)})^2}=\frac{4y_i^2e^{2y_iF_{m-1}(x)}}{(1+e^{2y_iF_{m-1}(x)})^2}}

[[Poisson Distribution]]

  • 概率密度函数:f(y;μ)=μyy!eμ{f(y;\mu)=\frac{\mu ^y}{y!}e^{-\mu }}

  • 对数似然函数:l(y;μ)=i=1myilogμiμilog(yi!){l(y;\mu) = \sum^{m}_{i=1} y_i\log \mu_i - \mu_i - \log(y_i!)}

  • 损失函数:L(yi,F(xi))=i=1mehW(x)yilog(hW(xi)){L(y_i, F(x_i)) = \sum^{m}_{i=1}e^{h_W(x)} - y_i\log (h_W(x_i))}

  • 负梯度:yi~=[L(yi,F(xi))F(x)]F(x)=Fm1(x)=yieFm1(xi){ \tilde{y_i} = -[\frac{\partial L(y_i, F(x_i))}{\partial F(x)}]_{F(x)=F_{m-1}(x)} = y_i-e^{F_{m-1}(x_i)}}

  • 初始化 log(i=1myim){\log(\frac {\sum^{m}_{i=1} y_i}{m})}

  • 叶节点估计 log(i=1myi~i=1meFm1(xi)){\log(\frac {\sum^{m}_{i=1} \tilde{y_i}}{\sum^{m}_{i=1} e^{F_{m-1}(x_i)}})}

[[Laplace Distribution]] MAE

  • 损失函数 i=1myiF(xi){\sum^{m}_{i=1} |y_i-F(x_i)|}

  • 负梯度 yi~=[L(yi,F(xi))F(x)]F(x)=Fm1(x)=sign(yiF(xi)){ \tilde{y_i} = -[\frac{\partial L(y_i, F(x_i))}{\partial F(x)}]_{F(x)=F_{m-1}(x)} = sign(y_i-F(x_i))}

  • 初始化 median(y){median(y)}

  • 叶节点估计 median(yi~){median(\tilde{y_i})}

MAPE

  • 损失函数 i=1myiF(xi)yi{\sum^{m}_{i=1} \frac{|y_i-F(x_i)|}{y_i}}

  • 负梯度 yi~=[L(yi,F(xi))F(x)]F(x)=Fm1(x)=sign(yiF(xi))yi{ \tilde{y_i} = -[\frac{\partial L(y_i, F(x_i))}{\partial F(x)}]_{F(x)=F_{m-1}(x)} = -\frac{sign(y_i-F(x_i))}{y_i}}

  • 初始化 medianw(y){median_w(y)}

  • 叶节点估计 medianw(yi~){median_w(\tilde{y_i})}

  • 证明:

    • 对损失函数求完偏导之后的结果是

      • L(yi,F(xi))F(x)=yi~<01yi+yi~>=01yi{\frac{\partial L(y_i, F(x_i))}{\partial F(x)}= \sum_{\tilde{y_i}<0} - \frac{1}{y_i} +\sum_{\tilde{y_i}>=0} \frac{1}{y_i}}
    • 叶节点要估计一个值 λ{\lambda},使损失函数最小,即可以转化为

      • L(yi,F(xi)+λ)F(x)=yi~+λ<01yi+yi+λ~01yi=0{\frac{\partial L(y_i, F(x_i)+\lambda)}{\partial F(x)}= \sum_{\tilde{y_i}+\lambda<0} - \frac{1}{y_i} +\sum_{\tilde{y_i+\lambda}\geqslant 0} \frac{1}{y_i}}=0
    • 对残差进行从小到大排序,找到一个 yp~{\tilde{y_p}} 值,满足

      • i=0p1yi12i=0m1yi{\sum_{i=0}^{p} - \frac{1}{y_i} \geqslant \frac{1}{2}\sum_{i=0}^{m} \frac{1}{y_i}}

[[SMAPE]]

  • 解析解不好求,还是直接用 XGB 的二阶泰勒展开方便

  • 损失函数 i=1myiF(xi)yi+F(xi)2{\sum^{m}_{i=1} \frac{|y_i-F(x_i)|}{\frac{y_i + F(x_i)}{2}}}

  • 负梯度 yi~=[L(yi,F(xi))F(x)]F(x)=Fm1(x)=4yisign(yiF(xi))(yi+F(xi))2{\tilde{y_i} = -[\frac{\partial L(y_i, F(x_i))}{\partial F(x)}]_{F(x)=F_{m-1}(x)} = -\frac{ 4 * y_i * sign(y_i - F(x_i))}{(y_i + F(x_i))^2}}

Ref

作者

Ryen Xiang

发布于

2024-10-05

更新于

2024-10-05

许可协议


网络回响

评论