树模型的进化之路 颜发才 facaiyan@gmailcom facaiygithubio 新浪微博算法平台 2017 年 3 月 11 日
目录 决策树 Adaptive Boosting (AdaBoost) Gradient Boost Decision Tree (GBDT) TreeBoost XGBoost 总结
决策树 决策树直观印象进化分支
决策树 直观印象 petal length (cm) 245 gini = 06667 samples = 150 value = [50, 50, 50] class = setosa False True gini = 00 samples = 50 value = [50, 0, 0] class = setosa petal width (cm) 175 gini = 05 samples = 100 value = [0, 50, 50] class = versicolor petal length (cm) 495 gini = 0168 samples = 54 value = [0, 49, 5] class = versicolor petal length (cm) 485 gini = 00425 samples = 46 value = [0, 1, 45] class = virginica petal width (cm) 165 gini = 00408 samples = 48 value = [0, 47, 1] class = versicolor petal width (cm) 155 gini = 04444 samples = 6 value = [0, 2, 4] class = virginica sepal length (cm) 595 gini = 04444 samples = 3 value = [0, 1, 2] class = virginica gini = 00 samples = 43 value = [0, 0, 43] class = virginica gini = 00 samples = 47 value = [0, 47, 0] class = versicolor gini = 00 samples = 1 value = [0, 0, 1] class = virginica gini = 00 samples = 3 value = [0, 0, 3] class = virginica sepal length (cm) 695 gini = 04444 samples = 3 value = [0, 2, 1] class = versicolor gini = 00 samples = 1 value = [0, 1, 0] class = versicolor gini = 00 samples = 2 value = [0, 0, 2] class = virginica gini = 00 samples = 2 value = [0, 2, 0] class = versicolor gini = 00 samples = 1 value = [0, 0, 1] class = virginica 图 : 决策树示意 1 1 Decision trees of iris data, scikit-learn
GBDT TreeBoost 和 XGBoost 决策树进化分支集成方法 (Ensemble Method) 图 : Bagging 和 Boosting 示意 2 2 What is the difference between Bagging and Boosting, xristica
GBDT TreeBoost 和 XGBoost Adaptive Boosting (AdaBoost) Adaptive Boosting (AdaBoost)
GBDT TreeBoost 和 XGBoost Adaptive Boosting (AdaBoost) 图 : AdaBoost, 啊打,Boost! 3 3 甄子丹对打李小龙全集视频, 视觉网
GBDT TreeBoost 和 XGBoost Adaptive Boosting (AdaBoost) 图 : AdaBoost 训练示意 4 4 The Viola/Jones Face Detector (2001) (Most slides from Paul Viola)
GBDT TreeBoost 和 XGBoost Gradient Boost Decision Tree (GBDT) Gradient Boost Decision Tree (GBDT) 直观印象算法流程从最优化角度的理解从泛函角度的理解从降维角度的理解 spark 实现代码
GBDT TreeBoost 和 XGBoost Gradient Boost Decision Tree (GBDT) 直观印象图 : GBDT 示意 5 5 Boosted Random Forest and Transfer Learning
Gradient Boost Decision Tree (GBDT) 算法流程 Algorithm 1: Gradient_Boost 1 F 0 (x) = arg min ρ N i=1 L(y i, ρ) 2 for m = 1 to M do 3 ỹ = [ L(yi,F(x i )) F(x i ) ] F(x)=F m 1 (x), 4 m = arg min,β N i=1 [ỹ i βh(x i ; )] 2 i = 1, 2,, N 5 ρ m = arg min ρ N i=1 L (y i, F m 1 (x i ) + ρh(x i ; m )) 6 F m (x) = F m 1 (x) + ρ m h(x; m ) 7 end Greedy function approximation: A gradient boosting machine, Jerome H Friedman
GBDT TreeBoost 和 XGBoost Gradient Boost Decision Tree (GBDT) 从最优化角度的理解图 : 损失函数示意 6 6 Gradient Visual, Wikipedia
Gradient Boost Decision Tree (GBDT) 从最优化角度的理解 y 0 L(y, F 0 (x)) = L(y, y 0 ) ỹ 0 L(y, y 1 ) L(y, y 2 ) y y n ỹ 2 y 2 ỹ 1 y 1 L(y, y n ) y i = F i (x) (a) (b)
Gradient Boost Decision Tree (GBDT) 从最优化角度的理解 y 0 = F 0(X) ỹ 0 L( ) y y n ỹ 1 y 1 F 0 y 2 F 1 h(x; ) X F m (x) = F m 1 (x) + ρ m h(x; m ) h(x; )
Gradient Boost Decision Tree (GBDT) 从泛函角度的理解 y0 = F0(X) F0 y yn y1 y2 Fn F h(x; ) X 图 : 三维向量空间 7 7 x = (1, 0, 0), y = (0, 1, 0), z = (0, 0, 1)
GBDT TreeBoost 和 XGBoost Gradient Boost Decision Tree (GBDT) 从泛函角度的理解泰勒展开 : n=0 f (n) (a) n! (x a) n 图 : sin 函数泰勒展开示意 8 GBDT: F n (x) = F 0 (x) + n ρ mh(x; m ) 8 Intuitive Understanding of Sine Waves
Gradient Boost Decision Tree (GBDT) 从降维角度的理解 y0 = F0(X) F0 y yn y1 y2 Fn F h(x; ) X (a) (b) 图 : PCA 9 比较示意 9 Tutorial: Principal Components Analysis (PCA)
Gradient Boost Decision Tree (GBDT) spark 实现代码
GBDT TreeBoost 和 XGBoost TreeBoost TreeBoost 直观印象算法推导常见的损失函数 sklearn 实现代码
TreeBoost 直观印象 m = arg min,β N i=1 [ỹ i βh(x i ; )] 2 ρ m = arg min ρ N i=1 L (y i, F m 1 (x i ) + ρh(x i ; m )) ρ 33 15 47 10 20 38 51 5 18 36 39 49 图 : Tree boost 示意 10 10 Red-black tree, Madit
TreeBoost 算法推导 J- 叶子树模型 h(x; {b j, R j } J 1 ) = J j=1 b j (x R j ) petal length (cm) 245 gini = 06667 samples = 150 value = [50, 50, 50] class = setosa False True gini = 00 samples = 50 value = [50, 0, 0] class = setosa petal width (cm) 175 gini = 05 samples = 100 value = [0, 50, 50] class = versicolor petal length (cm) 495 gini = 0168 samples = 54 value = [0, 49, 5] class = versicolor petal length (cm) 485 gini = 00425 samples = 46 value = [0, 1, 45] class = virginica petal width (cm) 165 gini = 00408 samples = 48 value = [0, 47, 1] class = versicolor petal width (cm) 155 gini = 04444 samples = 6 value = [0, 2, 4] class = virginica sepal length (cm) 595 gini = 04444 samples = 3 value = [0, 1, 2] class = virginica gini = 00 samples = 43 value = [0, 0, 43] class = virginica gini = 00 samples = 47 value = [0, 47, 0] class = versicolor gini = 00 samples = 1 value = [0, 0, 1] class = virginica gini = 00 samples = 3 value = [0, 0, 3] class = virginica sepal length (cm) 695 gini = 04444 samples = 3 value = [0, 2, 1] class = versicolor gini = 00 samples = 1 value = [0, 1, 0] class = versicolor gini = 00 samples = 2 value = [0, 0, 2] class = virginica gini = 00 samples = 2 value = [0, 2, 0] class = versicolor gini = 00 samples = 1 value = [0, 0, 1] class = virginica
TreeBoost 算法推导 ρ m = arg min N ρ i=1 L (y i, F m 1 (x i ) + ρh(x i ; m )) ( = arg min N ρ i=1 L y i, F m 1 (x i ) + ρ ) J j=1 b j (x R j ) ( = arg min N ρ i=1 L y i, F m 1 (x i ) + ) J j=1 ρb j (x R j ) {γ jm } J 1 = arg min {γj } J 1 N i=1 (y L i, F m 1 (x i ) + ) J j=1 γ j (x R jm ) γ jm = arg min γ x i R jm L(y i, F m 1 (x i ) + γ)
TreeBoost 常见的损失函数 γ jm = arg min γ x i R jm L(y i, F m 1 (x i ) + γ) L2 γ jm = arg min γ (y i, F m 1 (x i ) + γ) 2 x i R jm = Ave(y F m 1 (x)) L1 { } yi F m 1 (x i ) N γ jm = median W h(x i ; m ) 1 详细推导可见 :TreeBoost 原理和实现 (sklearn) 简介, 颜发才
TreeBoost sklearn 实现代码
GBDT TreeBoost 和 XGBoost XGBoost XGBoost 思路来源具体推导重要参数
GBDT TreeBoost 和 XGBoost XGBoost 思路来源 GBDT, 毎次迭代可描述成最优问题 : f m = arg min f n i=1 L (y i, ŷ i + f(x i )) = arg min L(f)
XGBoost 具体推导 泰勒展开 L(f) n i=1 g i = L(y i, ŷ i ) ŷ i h i = 2 L(y i, ŷ i ) ŷ 2 i [ L(y i, ŷ i ) + g i f(x i ) + 1 ] 2 h if 2 (x i ) + Ω(f) L(f) = J ( i Ij j=1 g i )b j + 1 2 ( h i + λ)b 2 j + γ R j i I j 详细推导可见 :xgboost 的基本原理与实现简介, 颜发才
XGBoost 具体推导 叶子值 b j = arg min bj L J = arg min bj ( j=1 i Ij J = arg min bj j=1 ( i Ij g i )b j + 1 2 ( h i + λ)b 2 j + γ R j i I j g i )b j + 1 2 ( h i + λ)b 2 j + γ R j i I j b j = i I j g i i I j h i + λ
XGBoost 具体推导 不纯性度量 (impurity) L = 1 2 J j=1 ( i I j g i ) 2 i I j h i + λ + γ R j = 1 2 H + γt L split = L L L L R = 1 2 (H L + H R H) + γ(t T L T R ) = 1 2 (H L + H R H) γ
XGBoost 具体推导 最终, 树生成公式 : L = 1 2 H + γt b i I j g i j = i I j h i + λ
GBDT TreeBoost 和 XGBoost XGBoost 重要参数 XGBoost Parameters objective reg:linear, binary:logistic, multi:softmax num_round, max_depth eta lambda (L2 reg), alpha (L1 reg) Notes on Parameter Tuning 稀疏数据 :0 和 missing
总结 总结
GBDT TreeBoost 和 XGBoost 总结总结 CART: 统一回归和分类问题 AdaBoost: 加权 GBDT: 残差 TreeBoost: 叶子权重 XGBoost: 损失函数指导决策树
谢谢! GBDT TreeBoost 和 XGBoost 树模型的进化之路 颜发才 facaiyan@gmailcom facaiygithubio 新浪微博算法平台 2017 年 3 月 11 日