diff options
-rw-r--r-- | ai-slides.tex | 51 |
1 files changed, 47 insertions, 4 deletions
diff --git a/ai-slides.tex b/ai-slides.tex index 3b79401..53cd37e 100644 --- a/ai-slides.tex +++ b/ai-slides.tex @@ -356,9 +356,11 @@ $\mathrm{income} = \beta_0 + \beta_1 \times \mathrm{education} + \beta_2 \times %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \foilhead{衡量模型准确度} -没有免费午餐理论:没有一种方法在所有数据集上是最优。 +{\em There is no free lunch in statistics.} -评价拟合质量 $MSE = \frac{1}{n} \sum (y_i - \hat{f}(x_i))^2$。可以在训练数据上看,也可以在测试数据(未见数据)上看。注意,$\hat{f}$一定是在训练数据上得出的。我们更关心的是$\hat{f}$在测试数据$(x_0, y_0)$上的表现。$\hat{f}(x_0)$是否接近$y_0$? $(x_0, y_0)$是训练$\hat{f}$时没有见过的数据。 +没有一种方法在所有数据集上是最优。 + +评价拟合质量 $MSE = \frac{1}{n} \sum (y_i - \hat{f}(x_i))^2$。可以在训练数据上看,也可以在测试数据(未见数据)上看。注意,$\hat{f}$一定是在训练数据上得出的。我们更关心的是$\hat{f}$在测试数据$(x_0, y_0)$上的表现。$\hat{f}(x_0)$是否接近$y_0$? $(x_0, y_0)$是训练 $\hat{f}$ 时没有见过的数据。 $\mathrm{test MSE} = \mathrm{Ave}(y_0 - \hat{f}(x_0))$才能真正看出$\hat{f}$好不好。$\mathrm{training MSE}$则看不出。 @@ -370,11 +372,16 @@ $\mathrm{training MSE}$可以是0,而$\mathrm{test MSE}$却很大。这就是 能够解释每条线,每个点的含义,变化趋势。$\mathrm{test MSE}$呈现U型。训练错误绝大多数情况都小于测试错误,因为各种统计学习的方法在设计上总是直接或间接最小化训练错误的。 -过度拟合现象:在训练数据上表现好,在测试数据上表现差。原因:没有捕捉真正的趋势,而是去拟合随机产生的趋势了。简单一点的模型反而不容易出现过度拟合现象。 +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\foilhead{过度拟合现象} + +Overfitting. + +在训练数据上表现好,在测试数据上表现差。原因:没有捕捉真正的趋势,而是去拟合随机产生的趋势了。简单一点的模型反而不容易出现过度拟合现象。 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\foilhead{测试错误U型曲线} +\foilhead{测试错误 U 型曲线} 模型灵活性(复杂度)变大,而测试错误不升反降。这是为什么? @@ -1891,4 +1898,40 @@ Random Forests. 从$p$个预测变量中选$m$个来建立树。 % Skipped: Boosting. Each tree is fit on a modified version of the original data set +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\foilhead{Maximal margin classifier} + +要求类是线性可分的 (linearly seperable)。 + +Hyperplane. $\beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_p X_p = 0$. + +$y_1, ..., y_n \in \{-1, +1\}$. + +Seperating hyperplane. + +$y_i (\beta_0 + \beta_1 x_{i1} + \cdots + \beta_p x_{ip}) > 0$ for all $i=1, ..., n$. + +Margin 是训练数据到 hyperplane 最短的距离。 + +Optimal seperating hyperplane. Maximal margin hyperplane (produces a maximal margin classifier) - mid-line of the widest ``slab''. Slab边界上的训练数据叫做 support vectors。 其它训练数据不管怎么移动,只要不跨过 slab, 那么 maximal margin hyperplane 仅由这些 support vectors 决定。 + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\foilhead{Support vector classifier} + +对 maximal margin classifier 的扩展, 使它能够在线性不可分的数据上使用。 + +Soft margin classifier. 使得一些训练数据可以进入margin, 甚至可以在 hyperplane 的错误一边。 + +Slack variables. + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\foilhead{Support vector machines} + +对 support vector classifier 的扩展 (非线性类的边界)。 + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +\foilhead{Connection with logistic regression} + +L + P + \end{document} |