您好,欢迎光临本网站![请登录][注册会员]  
文件名称: The Elements of Statistical Learning
  所属分类: Web开发
  开发工具:
  文件大小: 8mb
  下载次数: 0
  上传时间: 2009-10-28
  提 供 者: yangy*****
 详细说明: Preface to the Second Edition vii Preface to the First Edition xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Variable Types and Terminology . . . . . . . . . . . . . . 9 2.3 Two S imple Approaches to Prediction: Least Squares and Nearest Neighbors . . . . . . . . . . . 11 2.3.1 Linear Models and Least Squares . . . . . . . . 11 2.3.2 Nearest-Neighbor Methods . . . . . . . . . . . . 14 2.3.3 From Least Squares to Nearest Neighbors . . . . 16 2.4 Statistical Decision Theory . . . . . . . . . . . . . . . . . 18 2.5 Local Methods in High Dimensions . . . . . . . . . . . . . 22 2.6 Statistical Models, Supervised Learning and Function Approximation . . . . . . . . . . . . . . . . 28 2.6.1 A Statistical Model for the Joint Distribution Pr(X; Y ) . . . . . . . 28 2.6.2 Supervised Learning . . . . . . . . . . . . . . . . 29 2.6.3 Function Approximation . . . . . . . . . . . . . 29 2.7 Structured Regression Models . . . . . . . . . . . . . . . 32 2.7.1 Di±culty of the Problem . . . . . . . . . . . . . 32 2.8 Classes of Restricted Estimators . . . . . . . . . . . . . . 33 2.8.1 Roughness Penalty and Bayesian Methods . . . 34 2.8.2 Kernel Methods and Local Regression . . . . . . 34 2.8.3 Basis Functions and Dictionary Methods . . . . 35 2.9 Model Selection and the Bias{Variance Tradeo® . . . . . 37 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 39 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3 Linear Methods for Regression 43 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.2 Linear Regression Models and Least Squares . . . . . . . 44 3.2.1 Example: Prostate Cancer . . . . . . . . . . . . 49 3.2.2 The Gauss{Markov Theorem . . . . . . . . . . . 51 3.2.3 Multiple Regression from Simple Univariate Regression . . . . . . . . 52 3.2.4 Multiple Outputs . . . . . . . . . . . . . . . . . 56 3.3 Subset Selection . . . . . . . . . . . . . . . . . . . . . . . 57 3.3.1 Best-Subset Selection . . . . . . . . . . . . . . . 57 3.3.2 Forward- and Backward-Stepwise Selection . . . 58 3.3.3 Forward-Stagewise Regression . . . . . . . . . . 60 3.3.4 Prostate Cancer Data Example (Continued) . . 61 3.4 Shrinkage Methods . . . . . . . . . . . . . . . . . . . . . . 61 3.4.1 Ridge Regression . . . . . . . . . . . . . . . . . 61 3.4.2 The Lasso . . . . . . . . . . . . . . . . . . . . . 68 3.4.3 Discussion: Subset Selection, Ridge Regression and the Lasso . . . . . . . . . . . . . . . . . . . 69 3.4.4 Least Angle Regression . . . . . . . . . . . . . . 73 3.5 Methods Using Derived Input Directions . . . . . . . . . 79 3.5.1 Principal Components Regression . . . . . . . . 79 3.5.2 Partial Least Squares . . . . . . . . . . . . . . . 80 3.6 Discussion: A Comparison of the Selection and Shrinkage Methods . . . . . . . . . . . . . . . . . . . 82 3.7 Multiple Outcome Shrinkage and Selection . . . . . . . . 84 3.8 More on the Lasso and Related Path Algorithms . . . . . 86 3.8.1 Incremental Forward Stagewise Regression . . . 86 3.8.2 Piecewise-Linear Path Algorithms . . . . . . . . 89 3.8.3 The Dantzig Selector . . . . . . . . . . . . . . . 89 3.8.4 The Grouped Lasso . . . . . . . . . . . . . . . . 90 3.8.5 Further Properties of the Lasso . . . . . . . . . . 91 3.8.6 Pathwise Coordinate Optimization . . . . . . . . 92 3.9 Computational Considerations . . . . . . . . . . . . . . . 93 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 94 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4 Linear Methods for Classi¯cation 101 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.2 Linear Regression of an Indicator Matrix . . . . . . . . . 103 4.3 Linear Discriminant Analysis . . . . . . . . . . . . . . . . 106 4.3.1 Regularized Discriminant Analysis . . . . . . . . 112 4.3.2 Computations for LDA . . . . . . . . . . . . . . 113 4.3.3 Reduced-Rank Linear Discriminant Analysis . . 113 4.4 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . 119 4.4.1 Fitting Logistic Regression Models . . . . . . . . 120 4.4.2 Example: South African Heart Disease . . . . . 122 4.4.3 Quadratic Approximations and Inference . . . . 124 4.4.4 L1 Regularized Logistic Regression . . . . . . . . 125 4.4.5 Logistic Regression or LDA? . . . . . . . . . . . 127 4.5 Separating Hyperplanes . . . . . . . . . . . . . . . . . . . 129 4.5.1 Rosenblatt's Perceptron Learning Algorithm . . 130 4.5.2 Optimal Separating Hyperplanes . . . . . . . . . 132 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 135 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 5 Basis Expansions and Regularization 139 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 139 5.2 Piecewise Polynomials and Splines . . . . . . . . . . . . . 141 5.2.1 Natural Cubic Splines . . . . . . . . . . . . . . . 144 5.2.2 Example: South African Heart Disease (Continued)146 5.2.3 Example: Phoneme Recognition . . . . . . . . . 148 5.3 Filtering and Feature Extraction . . . . . . . . . . . . . . 150 5.4 Smoothing Splines . . . . . . . . . . . . . . . . . . . . . . 151 5.4.1 Degrees of Freedom and Smoother Matrices . . . 153 5.5 Automatic Selection of the Smoothing Parameters . . . . 156 5.5.1 Fixing the Degrees of Freedom . . . . . . . . . . 158 5.5.2 The Bias{Variance Tradeo® . . . . . . . . . . . . 158 5.6 Nonparametric Logistic Regression . . . . . . . . . . . . . 161 5.7 Multidimensional Splines . . . . . . . . . . . . . . . . . . 162 5.8 Regularization and Reproducing Kernel Hilbert Spaces . 167 5.8.1 Spaces of Functions Generated by Kernels . . . 168 5.8.2 Examples of RKHS . . . . . . . . . . . . . . . . 170 5.9 Wavelet Smoothing . . . . . . . . . . . . . . . . . . . . . 174 5.9.1 Wavelet Bases and the Wavelet Transform . . . 176 5.9.2 Adaptive Wavelet Filtering . . . . . . . . . . . . 179 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 181 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Appendix: Computational Considerations for Splines . . . . . . 186 Appendix: B-splines . . . . . . . . . . . . . . . . . . . . . 186 Appendix: Computations for Smoothing Splines . . . . . 189 6 Kernel Smoothing Methods 191 6.1 One-Dimensional Kernel Smoothers . . . . . . . . . . . . 192 6.1.1 Local Linear Regression . . . . . . . . . . . . . . 194 6.1.2 Local Polynomial Regression . . . . . . . . . . . 197 6.2 Selecting the Width of the Kernel . . . . . . . . . . . . . 198 6.3 Local Regression in IRp . . . . . . . . . . . . . . . . . . . 200 6.4 Structured Local Regression Models in IRp . . . . . . . . 201 6.4.1 Structured Kernels . . . . . . . . . . . . . . . . . 203 6.4.2 Structured Regression Functions . . . . . . . . . 203 6.5 Local Likelihood and Other Models . . . . . . . . . . . . 205 6.6 Kernel Density Estimation and Classi¯cation . . . . . . . 208 6.6.1 Kernel Density Estimation . . . . . . . . . . . . 208 6.6.2 Kernel Density Classi¯cation . . . . . . . . . . . 210 6.6.3 The Naive Bayes Classi¯er . . . . . . . . . . . . 210 6.7 Radial Basis Functions and Kernels . . . . . . . . . . . . 212 6.8 Mixture Models for Density Estimation and Classi¯cation 214 6.9 Computational Considerations . . . . . . . . . . . . . . . 216 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 216 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 7 Model Assessment and Selection 219 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 219 7.2 Bias, Variance and Model Complexity . . . . . . . . . . . 219 7.3 The Bias{Variance Decomposition . . . . . . . . . . . . . 223 7.3.1 Example: Bias{Variance Tradeo® . . . . . . . . 226 7.4 Optimism of the Training Error Rate . . . . . . . . . . . 228 7.5 Estimates of In-Sample Prediction Error . . . . . . . . . . 230 7.6 The E®ective Number of Parameters . . . . . . . . . . . . 232 7.7 The Bayesian Approach and BIC . . . . . . . . . . . . . . 233 7.8 Minimum Description Length . . . . . . . . . . . . . . . . 235 7.9 Vapnik{Chervonenkis Dimension . . . . . . . . . . . . . . 237 7.9.1 Example (Continued) . . . . . . . . . . . . . . . 239 7.10 Cross-Validation . . . . . . . . . . . . . . . . . . . . . . . 241 7.10.1 K-Fold Cross-Validation . . . . . . . . . . . . . 241 7.10.2 The Wrong and Right Way to Do Cross-validation . . . . . . . . . . . . . . . 245 7.10.3 Does Cross-Validation Really Work? . . . . . . . 247 7.11 Bootstrap Methods . . . . . . . . . . . . . . . . . . . . . 249 7.11.1 Example (Continued) . . . . . . . . . . . . . . . 252 7.12 Conditional or Expected Test Error? . . . . . . . . . . . . 254 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 257 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 8 Model Inference and Averaging 261 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 261 8.2 The Bootstrap and Maximum Likelihood Methods . . . . 261 8.2.1 A Smoothing Example . . . . . . . . . . . . . . 261 8.2.2 Maximum Likelihood Inference . . . . . . . . . . 265 8.2.3 Bootstrap versus Maximum Likelihood . . . . . 267 8.3 Bayesian Methods . . . . . . . . . . . . . . . . . . . . . . 267 8.4 Relationship Between the Bootstrap and Bayesian Inference . . . . . . . . . . . . . . . . . . . 271 8.5 The EM Algorithm . . . . . . . . . . . . . . . . . . . . . 272 8.5.1 Two-Component Mixture Model . . . . . . . . . 272 8.5.2 The EM Algorithm in General . . . . . . . . . . 276 8.5.3 EM as a Maximization{Maximization Procedure 277 8.6 MCMC for Sampling from the Posterior . . . . . . . . . . 279 8.7 Bagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 8.7.1 Example: Trees with Simulated Data . . . . . . 283 8.8 Model Averaging and Stacking . . . . . . . . . . . . . . . 288 8.9 Stochastic Search: Bumping . . . . . . . . . . . . . . . . . 290 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 292 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 9 Additive Models, Trees, and Related Methods 295 9.1 Generalized Additive Models . . . . . . . . . . . . . . . . 295 9.1.1 Fitting Additive Models . . . . . . . . . . . . . . 297 9.1.2 Example: Additive Logistic Regression . . . . . 299 9.1.3 Summary . . . . . . . . . . . . . . . . . . . . . . 304 9.2 Tree-Based Methods . . . . . . . . . . . . . . . . . . . . . 305 9.2.1 Background . . . . . . . . . . . . . . . . . . . . 305 9.2.2 Regression Trees . . . . . . . . . . . . . . . . . . 307 9.2.3 Classi¯cation Trees . . . . . . . . . . . . . . . . 308 9.2.4 Other Issues . . . . . . . . . . . . . . . . . . . . 310 9.2.5 Spam Example (Continued) . . . . . . . . . . . 313 9.3 PRIM: Bump Hunting . . . . . . . . . . . . . . . . . . . . 317 9.3.1 Spam Example (Continued) . . . . . . . . . . . 320 9.4 MARS: Multivariate Adaptive Regression Splines . . . . . 321 9.4.1 Spam Example (Continued) . . . . . . . . . . . 326 9.4.2 Example (Simulated Data) . . . . . . . . . . . . 327 9.4.3 Other Issues . . . . . . . . . . . . . . . . . . . . 328 9.5 Hierarchical Mixtures of Experts . . . . . . . . . . . . . . 329 9.6 Missing Data . . . . . . . . . . . . . . . . . . . . . . . . . 332 9.7 Computational Considerations . . . . . . . . . . . . . . . 334 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 334 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 10 Boosting and Additive Trees 337 10.1 Boosting Methods . . . . . . . . . . . . . . . . . . . . . . 337 10.1.1 Outline of This Chapter . . . . . . . . . . . . . . 340 10.2 Boosting Fits an Additive Model . . . . . . . . . . . . . . 341 10.3 Forward Stagewise Additive Modeling . . . . . . . . . . . 342 10.4 Exponential Loss and AdaBoost . . . . . . . . . . . . . . 343 10.5 Why Exponential Loss? . . . . . . . . . . . . . . . . . . . 345 10.6 Loss Functions and Robustness . . . . . . . . . . . . . . . 346 10.7 \O®-the-Shelf" Procedures for Data Mining . . . . . . . . 350 10.8 Example: Spam Data . . . . . . . . . . . . . . . . . . . . 352 10.9 Boosting Trees . . . . . . . . . . . . . . . . . . . . . . . . 353 10.10 Numerical Optimization via Gradient Boosting . . . . . . 358 10.10.1 Steepest Descent . . . . . . . . . . . . . . . . . . 358 10.10.2 Gradient Boosting . . . . . . . . . . . . . . . . . 359 10.10.3 Implementations of Gradient Boosting . . . . . . 360 10.11 Right-Sized Trees for Boosting . . . . . . . . . . . . . . . 361 10.12 Regularization . . . . . . . . . . . . . . . . . . . . . . . . 364 10.12.1 Shrinkage . . . . . . . . . . . . . . . . . . . . . . 364 10.12.2 Subsampling . . . . . . . . . . . . . . . . . . . . 365 10.13 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . 367 10.13.1 Relative Importance of Predictor Variables . . . 367 10.13.2 Partial Dependence Plots . . . . . . . . . . . . . 369 10.14 Illustrations . . . . . . . . . . . . . . . . . . . . . . . . . . 371 10.14.1 California Housing . . . . . . . . . . . . . . . . . 371 10.14.2 New Zealand Fish . . . . . . . . . . . . . . . . . 375 10.14.3 Demographics Data . . . . . . . . . . . . . . . . 379 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 380 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384 11 Neural Networks 389 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 389 11.2 Projection Pursuit Regression . . . . . . . . . . . . . . . 389 11.3 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . 392 11.4 Fitting Neural Networks . . . . . . . . . . . . . . . . . . . 395 11.5 Some Issues in Training Neural Networks . . . . . . . . . 397 11.5.1 Starting Values . . . . . . . . . . . . . . . . . . . 397 11.5.2 Over¯tting . . . . . . . . . . . . . . . . . . . . . 398 11.5.3 Scaling of the Inputs . . . . . . . . . . . . . . . 398 11.5.4 Number of Hidden Units and Layers . . . . . . . 400 11.5.5 Multiple Minima . . . . . . . . . . . . . . . . . . 400 11.6 Example: Simulated Data . . . . . . . . . . . . . . . . . . 401 11.7 Example: ZIP Code Data . . . . . . . . . . . . . . . . . . 404 11.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 408 11.9 Bayesian Neural Nets and the NIPS 2003 Challenge . . . 409 11.9.1 Bayes, Boosting and Bagging . . . . . . . . . . . 410 11.9.2 Performance Comparisons . . . . . . . . . . . . 412 11.10 Computational Considerations . . . . . . . . . . . . . . . 414 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 415 12 Support Vector Machines and Flexible Discriminants 417 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 417 12.2 The Support Vector Classi¯er . . . . . . . . . . . . . . . . 417 12.2.1 Computing the Support Vector Classi¯er . . . . 420 12.2.2 Mixture Example (Continued) . . . . . . . . . . 421 12.3 Support Vector Machines and Kernels . . . . . . . . . . . 423 12.3.1 Computing the SVM for Classi¯cation . . . . . . 423 12.3.2 The SVM as a Penalization Method . . . . . . . 426 12.3.3 Function Estimation and Reproducing Kernels . 428 12.3.4 SVMs and the Curse of Dimensionality . . . . . 431 12.3.5 A Path Algorithm for the SVM Classi¯er . . . . 432 12.3.6 Support Vector Machines for Regression . . . . . 434 12.3.7 Regression and Kernels . . . . . . . . . . . . . . 436 12.3.8 Discussion . . . . . . . . . . . . . . . . . . . . . 438 12.4 Generalizing Linear Discriminant Analysis . . . . . . . . 438 12.5 Flexible Discriminant Analysis . . . . . . . . . . . . . . . 440 12.5.1 Computing the FDA Estimates . . . . . . . . . . 444 12.6 Penalized Discriminant Analysis . . . . . . . . . . . . . . 446 12.7 Mixture Discriminant Analysis . . . . . . . . . . . . . . . 449 12.7.1 Example: Waveform Data . . . . . . . . . . . . . 451 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 455 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455 13 Prototype Methods and Nearest-Neighbors 459 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 459 13.2 Prototype Methods . . . . . . . . . . . . . . . . . . . . . 459 13.2.1 K-means Clustering . . . . . . . . . . . . . . . . 460 13.2.2 Learning Vector Quantization . . . . . . . . . . 462 13.2.3 Gaussian Mixtures . . . . . . . . . . . . . . . . . 463 13.3 k-Nearest-Neighbor Classi¯ers . . . . . . . . . . . . . . . 463 13.3.1 Example: A Comparative Study . . . . . . . . . 468 13.3.2 Example: k-Nearest-Neighbors and Image Scene Classi¯cation . . . . . . . . . . 470 13.3.3 Invariant Metrics and Tangent Distance . . . . . 471 13.4 Adaptive Nearest-Neighbor Methods . . . . . . . . . . . . 475 13.4.1 Example . . . . . . . . . . . . . . . . . . . . . . 478 13.4.2 Global Dimension Reduction for Nearest-Neighbors . . . . . . . . . . . . . . . 479 13.5 Computational Considerations . . . . . . . . . . . . . . . 480 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 481 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481 14 Unsupervised Learning 485 14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 485 14.2 Association Rules . . . . . . . . . . . . . . . . . . . . . . 487 14.2.1 Market Basket Analysis . . . . . . . . . . . . . . 488 14.2.2 The Apriori Algorithm . . . . . . . . . . . . . . 489 14.2.3 Example: Market Basket Analysis . . . . . . . . 492 14.2.4 Unsupervised as Supervised Learning . . . . . . 495 14.2.5 Generalized Association Rules . . . . . . . . . . 497 14.2.6 Choice of Supervised Learning Method . . . . . 499 14.2.7 Example: Market Basket Analysis (Continued) . 499 14.3 Cluster Analysis . . . . . . . . . . . . . . . . . . . . . . . 501 14.3.1 Proximity Matrices . . . . . . . . . . . . . . . . 503 14.3.2 Dissimilarities Based on Attributes . . . . . . . 503 14.3.3 Object Dissimilarity . . . . . . . . . . . . . . . . 505 14.3.4 Clustering Algorithms . . . . . . . . . . . . . . . 507 14.3.5 Combinatorial Algorithms . . . . . . . . . . . . 507 14.3.6 K-means . . . . . . . . . . . . . . . . . . . . . . 509 14.3.7 Gaussian Mixtures as Soft K-means Clustering . 510 14.3.8 Example: Human Tumor Microarray Data . . . 512 14.3.9 Vector Quantization . . . . . . . . . . . . . . . . 514 14.3.10 K-medoids . . . . . . . . . . . . . . . . . . . . . 515 14.3.11 Practical Issues . . . . . . . . . . . . . . . . . . 518 14.3.12 Hierarchical Clustering . . . . . . . . . . . . . . 520 14.4 Self-Organizing Maps . . . . . . . . . . . . . . . . . . . . 528 14.5 Principal Components, Curves and Surfaces . . . . . . . . 534 14.5.1 Principal Components . . . . . . . . . . . . . . . 534 14.5.2 Principal Curves and Surfaces . . . . . . . . . . 541 14.5.3 Spectral Clustering . . . . . . . . . . . . . . . . 544 14.5.4 Kernel Principal Components . . . . . . . . . . . 547 14.5.5 Sparse Principal Components . . . . . . . . . . . 550 14.6 Non-negative Matrix Factorization . . . . . . . . . . . . . 553 14.6.1 Archetypal Analysis . . . . . . . . . . . . . . . . 554 14.7 Independent Component Analysis and Exploratory Projection Pursuit . . . . . . . . . . . . 557 14.7.1 Latent Variables and Factor Analysis . . . . . . 558 14.7.2 Independent Component Analysis . . . . . . . . 560 14.7.3 Exploratory Projection Pursuit . . . . . . . . . . 565 14.7.4 A Direct Approach to ICA . . . . . . . . . . . . 565 14.8 Multidimensional Scaling . . . . . . . . . . . . . . . . . . 570 14.9 Nonlinear Dimension Reduction and Local Multidimensional Scaling . . . . . . . . . . . . 572 14.10 The Google PageRank Algorithm . . . . . . . . . . . . . 576 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 578 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579 15 Random Forests 587 15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 587 15.2 De¯nition of Random Forests . . . . . . . . . . . . . . . . 587 15.3 Details of Random Forests . . . . . . . . . . . . . . . . . 592 15.3.1 Out of Bag Samples . . . . . . . . . . . . . . . . 592 15.3.2 Variable Importance . . . . . . . . . . . . . . . . 593 15.3.3 Proximity Plots . . . . . . . . . . . . . . . . . . 595 15.3.4 Random Forests and Over¯tting . . . . . . . . . 596 15.4 Analysis of Random Forests . . . . . . . . . . . . . . . . . 597 15.4.1 Variance and the De-Correlation E®ect . . . . . 597 15.4.2 Bias . . . . . . . . . . . . . . . . . . . . . . . . . 600 15.4.3 Adaptive Nearest Neighbors . . . . . . . . . . . 601 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 602 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603 16 Ensemble Learning 605 16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 605 16.2 Boosting and Regularization Paths . . . . . . . . . . . . . 607 16.2.1 Penalized Regression . . . . . . . . . . . . . . . 607 16.2.2 The \Bet on Sparsity" Principle . . . . . . . . . 610 16.2.3 Regularization Paths, Over-¯tting and Margins . 613 16.3 Learning Ensembles . . . . . . . . . . . . . . . . . . . . . 616 16.3.1 Learning a Good Ensemble . . . . . . . . . . . . 617 16.3.2 Rule Ensembles . . . . . . . . . . . . . . . . . . 622 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 623 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624 17 Undirected Graphical Models 625 17.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 625 17.2 Markov Graphs and Their Properties . . . . . . . . . . . 627 17.3 Undirected Graphical Models for Continuous Variables . 630 17.3.1 Estimation of the Parameters when the Graph Structure is Known . . . . . . . 631 17.3.2 Estimation of the Graph Structure . . . . . . . . 635 17.4 Undirected Graphical Models for Discrete Variables . . . 638 17.4.1 Estimation of the Parameters when the Graph Structure is Known . . . . . . . 639 17.4.2 Hidden Nodes . . . . . . . . . . . . . . . . . . . 641 17.4.3 Estimation of the Graph Structure . . . . . . . . 642 17.4.4 Restricted Boltzmann Machines . . . . . . . . . 643 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645 18 High-Dimensional Problems: p À N 649 18.1 When p is Much Bigger than N . . . . . . . . . . . . . . 649 18.2 Diagonal Linear Discriminant Analysis and Nearest Shrunken Centroids . . . . . . . . . . . . . . 651 18.3 Linear Classi¯ers with Quadratic Regularization . . . . . 654 18.3.1 Regularized Discriminant Analysis . . . . . . . . 656 18.3.2 Logistic Regression with Quadratic Regularization . . . . . . . . . . 657 18.3.3 The Support Vector Classi¯er . . . . . . . . . . 657 18.3.4 Feature Selection . . . . . . . . . . . . . . . . . . 658 18.3.5 Computational Shortcuts When p À N . . . . . 659 18.4 Linear Classi¯ers with L1 Regularization . . . . . . . . . 661 18.4.1 Application of Lasso to Protein Mass Spectroscopy . . . . . . . . . . 664 18.4.2 The Fused Lasso for Functional Data . . . . . . 666 18.5 Classi¯cation When Features are Unavailable . . . . . . . 668 18.5.1 Example: String Kernels and Protein Classi¯cation . . . . . . . . . . . . . 668 18.5.2 Classi¯cation and Other Models Using Inner-Product Kernels and Pairwise Distances . 670 18.5.3 Example: Abstracts Classi¯cation . . . . . . . . 672 18.6 High-Dimensional Regression: Supervised Principal Components . . . . . . . . . . . . . 674 18.6.1 Connection to Latent-Variable Modeling . . . . 678 18.6.2 Relationship with Partial Least Squares . . . . . 680 18.6.3 Pre-Conditioning for Feature Selection . . . . . 681 18.7 Feature Assessment and the Multiple-Testing Problem . . 683 18.7.1 The False Discovery Rate . . . . . . . . . . . . . 687 18.7.2 Asymmetric Cutpoints and the SAM Procedure 690 18.7.3 A Bayesian Interpretation of the FDR . . . . . . 692 18.8 Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . 693 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694 References 699 Author Index 729 Index 737 ...展开收缩
(系统自动生成,下载前可以参看下载内容)

下载文件列表

相关说明

  • 本站资源为会员上传分享交流与学习,如有侵犯您的权益,请联系我们删除.
  • 本站是交换下载平台,提供交流渠道,下载内容来自于网络,除下载问题外,其它问题请自行百度
  • 本站已设置防盗链,请勿用迅雷、QQ旋风等多线程下载软件下载资源,下载后用WinRAR最新版进行解压.
  • 如果您发现内容无法下载,请稍后再次尝试;或者到消费记录里找到下载记录反馈给我们.
  • 下载后发现下载的内容跟说明不相乎,请到消费记录里找到下载记录反馈给我们,经确认后退回积分.
  • 如下载前有疑问,可以通过点击"提供者"的名字,查看对方的联系方式,联系对方咨询.
 相关搜索: Statistical Learning Data Mining
 输入关键字,在本站1000多万海量源码库中尽情搜索: