Mathematics for Machine Learning，作者是Marc Peter Dei

文件名称: Mathematics for Machine Learning

所属分类: 机器学习

开发工具:

文件大小: 8mb

下载次数: 0

上传时间: 2019-03-16

提供者: lex_g******

下载 (8mb)

不能下载？报告错误

详细说明：Mathematics for Machine Learning，作者是Marc Peter Deisenroth, A Aldo Faisal, Cheng Soon Ong 这本书的书签应该是正确的Contents List of illustrations Foreword Part i Mathematical Foundations 1 Introduction and motivation 11 1.1 Finding Words for Intuitions 12 1.2 Two Ways to Read this Book 1.3 Exercises and feedback 16 Linear Algebra 17 2.1 Systems of Linear equations 19 2.2 Matrices 22 2.3 Solving Systems of Linear Equations 27 2.4 Vector Spaces 35 2.5 Linear Independence 40 2.6 Basis and rank 44 2.7 Linear Mappings 8 2.8 Affine Spaces 61 2.9 Further Reading 63 Exercises 63 3 Analytic Geomety 70 3.1 Norms 71 3.2 Inner product 3.3 Lengths and Distances 3.4 Angles and orthogonality 76 3.5 Orthonormal basis 78 3.6 Orthogonal Complement 79 3.7 Inner Product of Functions 80 3.8 Orthogonal projections 81 3. 9 Rotations 91 3.10 Further Reading 94 Exercises 95 4 Matrix Decompositions 98 Draft(March 15, 2019)of"Mathematics for Machine Learning"(2019 by Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong. To be published by Cambridge University Press. Please do notpostordistributethisfilepleaselinktohttps://mml-book.com 4,1 Determinant and trace 99 4.2 Eigenvalues and eigenvectors 105 4.3 Cholesky decomposition 114 4.4 Eigendecomposition and Diagonalization 115 4.5 Singular Value Decomposition 119 4.6 Matrix Approximation 129 4.7 Matrix Phylogeny 134 4.8 Further Reading 135 Exercises 137 Vector Calcul 139 5.1 Differentiation of Univariate functions 141 5.2 Partial Differentiation and gradients 146 5.3 Gradients of vector - Valued functions 149 5.4 Gradients of matrices 155 5.5 Useful Identities for Computing Gradients 158 5.6 Backpropagation and Automatic Differentiation 159 5.7 Higher-order derivatives 164 5.8 Linearization and Multivariate Taylor Series 165 Further re 170 Exercises 170 6 Probability and distributions 172 6. 1 Construction of a Probability Space 172 6.2 Discrete and continuous probabilities 178 6. 3 Sum Rule, Product Rule and bayes'Theorem 183 6.4 Summary Statistics and Independence 186 6.5 Gaussian distribution 197 6.6 Conjugacy and the Exponential Family 204 6.7 Change of Variables/Inverse Transform 214 6. 8 Further readi 220 Exercises 7 Continuous Optimization 225 7.1 Optimization using Gradient Descent 227 7.2 Constrained Optimization and Lagrange multipliers 233 7. 3 Convex Optimization 236 7. 4 Further Reading 246 Exercises Part II Central Machine Learning Problems 249 8 when models meet data 251 8.1 Empirical risk minimization 258 8. 2 Parameter estimation 265 8.3 Probabilistic Modeling and Inference 272 8.4 Directed Graphical Models 277 Draft(2019-03-15)of"mathematicsforMachineLearning".Feedbacktohttps://mml-book.com Contents 8.5 Model selection 283 Linear regression 289 9.1 Problem Formulation 291 9.2 Parameter Estimation 292 9.3 Bayesian Linear Regression 303 9.4 Maximum Likelihood as Orthogonal Projection 313 9.5 Further reading 315 10 Dimensionality Reduction with Principal Component Analysis 317 1 Problem Setting 318 10.2 Maximum Variance Perspective 320 10.3 Projection Perspective 325 10.4 Eigenvector Computation and Low-Rank Approximations 333 10.5 PCA in High dimensions 335 10.6 Key Steps of PCa in Practice 336 10.7 Latent Variable Perspective 339 10.8 Further Reading 343 11 Density Estimation with Gaussian Mixture Models 348 11.1 Gaussian Mixture model 349 11.2 Parameter Learning via Maximum Likelihood 350 11.3 EM Algorithm 360 11.4 Latent Variable Perspective 363 11.5 Further Reading 368 12 Classification with Support Vector Machines 370 12.1 Separating Hyperplanes 372 12.2 Primal Support Vector Machine 374 12. 3 Dual Support Vector Machine 383 12.4 Kernels 388 12.5 Numerical solution 390 12.6 Further Reading 392 References 395 Index 407 C2019 M. P. Deisenroth, AA Faisal, C S. Ong. To be published by Cambridge University Press. List of Figures 1.1 The foundations and four pillars of machine learning 14 2.1 Different types of vectors. 2.2 Linear algebra mind map 19 2.3 Geometric interpretation of systems of linear equations 2.4 A matrix can be represented as a long vector. 22 2.5 Matrix multiplication 23 2.6 Examples of subspaces 39 2.7 Geographic example of linearly dependent vectors 2.8 Two different coordinate systems 50 2.9 Different coordinate representations of a vector 2.10 Three examples of linear transformations 52 2.11 Basis change 56 2. 12 Kernel and image of a linear mapping p: V,w 59 2. 13 Lines are affine subspaces 62 3.1 Analytic geometry mind map 3.2 Illustration of different norms 3.3 Triangle inequality. 3.4 76 3.5 Angle between two vectors 6 Angle between two vectors 3.7 A plane can be described by its nornal vector. 80 3.9 Orthogonal projection 82 3.10 Examples of projections onto one-dimensional subspaces 83 3.11 Projection onto a two-dimensional subspace 85 3.12 Gram-Schmidt orthogonalization 89 3. 13 Projection onto an affine space 3.14 Rotation 3.15 Robotic arm 91 3. 16 Rotation of the standard basis in R by an angle g 92 3. 17 Rotation in three dimensions 93 4.1 Matrix decomposition mind map 99 4. 2 The area of a parallelogram computed using the determinant 101 4.3 The volume of a parallelepiped computed using the determinant. 101 4. 4 Determinants and eigenspaces 109 4.5 C elegans neural network 110 4.6 Geometric interpretation of eigenvalues 113 4.7 Eigendecomposition as sequential transformations Draft(March 15, 2019)of"Mathematics for Machine Learning"(2019 by Marc Peter Deisenroth A. Aldo Faisal, and Cheng Soon Ong. To be published by Cambridge University Press. Please do notpostordistributethisfilepleaselinktohttps://mml-book.com List of figures 4.8 Intuition behind SvD as sequential transformations 4.9 SVD and mapping of vectors 122 4.10 SVd decomposition for movie ratings 127 4.11 Image processing with the SVD 130 4.12 Image reconstruction with the Svd 131 4.13 Phylogeny of matrices in machine learning 134 5.1 Different problems for which we need vector calculus 139 5.2 Vector calculus mindmap 140 5.3 Difference quotient 141 5.4 Taylor polynomials. 144 5.5 Jacobian determinant 151 5.6 Dimensionality of partial derivatives 152 7 Gradient computation of a matrix with respect to a vector 155 5. 8 Forward pass in a multi-layer neural network 160 5.9 Backward pass in a multi-layer neural network. 161 5.10 Data flow graph. 161 5.11 Computation graph 162 5.12 Linear approximation of a function 165 5.13 Visualizing outer products 166 6.1 Probability mind map 173 6.2 Visualization of a discrete bivariate probability mass function. 179 6.3 Examples of discrete and continuous uniform distributions 182 6.4 Mean. Mode and Median 189 6.5 Identical means and variances but different covariances 191 6.6 Geometry of random variables 196 .7 Gaussian distribution of two random variables a, y 197 6.8 Gaussian distributions overlaid with 100 samples 198 6.9 Bivariate Gaussian with conditional and marginal 200 6.10 Examples of the Binomial distribution 206 6.11 Examples of the Beta distribution for different values of a and 6. 207 7.1 Optimization mind map 226 7.2 Example objective function 227 7. 3 Gradient descent on a two-dimensional quadratic surface 229 7.4 Illustration of constrained optimization 233 7.5 Example of a convex function 236 7.6 Example of a convex set. 236 7.7 Example of a nonconvex set 237 7. 8 The negative entropy and its tangent. 238 7. 9 Illustration of a linear program. 240 8.1 Toy data for linear regression 254 8.2 Example function and its prediction 255 8.3 Example function and its uncertainty 256 8.4 K-fold cross validation 263 8.5 Maximum likelihood estimate 8.6 Maximum a posteriori estimation 8.7 Model fitting 270 8.8 Fitting of different model classes 271 8.9 Examples of directed graphical models 278 C2019 M. P. Deisenroth, AA Faisal, C S. Ong. To be published by Cambridge University Press. 8.10 Graphical models for a repeated Bernoulli experiment 280 8. 11 D-separation example 281 8. 12 Three types of grap phical models 282 8. 13 Nested cross validation 28 8. 14 Bayesian inference embodies Occam's razor 285 8.15 Hierarchical generative process in Bayesian model selection 286 9.1 Regression 9 9.2 Linear regression example 292 9.3 Probabilistic graphical model for linear regression 292 9.4 Polynomial regression 9.5 Maximum likelihood fits for different polynomial degrees M. 299 9.6 Training and test error. 300 Polynomial regression: Maximum likelihood and MAP estimates. 302 9.8 Graphical model for Bayesian linear regressior 304 9. 9 Prior over functions 305 9.10 Bayesian linear regression and posterior over functions 310 9.11 Bayesian linear regression 311 9.12 Geometric interpretation of least squares 313 10.1 Illustration: Dimensionality reduction 317 10.2 Graphical illustration of PCA. 10. 3 Examples of handwritten digits from the mnist dataset 320 10.4 Illustration of the maximum variance perspective 321 10.5 Properties of the training data of mnist'8' 324 10.6 Illustration of the projection approach 325 10.7 Simplified projection setting 326 10.8 Optimal projection 328 10.9 Orthogonal projection and displacement vectors 330 10. 10 Embedding of mnist digits 332 10.11 Steps of PCA 10. 12 Effect of the number of principal components on reconstruction. 338 10. 13 Squared reconstruction error versus the number of components. 339 10. 14 PPCA graphical model 340 10. 15 Generating new MNiST digits 341 10.16 PCA as an auto-encoder 344 11.1 Dataset that cannot be represented by a gaussian 348 11.2 Gaussian mixture model 350 11.3 Initial setting: GMM with three mixture components 350 11.4 Update of the mean parameter of mixture component in a gmm. 355 11.5 Effect of updating the mean values in a gmn 355 11.6 Effect of updating the variances in a GMm 358 11.7 Effect of updating the mixture weights in a gmm 360 11.8 EM algorithm applied to the GMM from Figure 11.2. 11.9 Illustration of the EM algorithm 362 11.10 GMM fit and responsibilities when EM converges 36 11.11 Graphical model for a GMM with a single data point. 11 12 Graphical model for a gmM with N data points 366 11 13 Histogram and kernel density estimation 369 12. 1 Example 2d data for classification 371 Draft(2019-03-15)of"mathematicsforMachineLearning".Feedbacktohttps://mml-book.com List of figures 12.2 Equation of a separating hyperplane 373 12.3 Possible separating hyperplanes 374 12.4 Vector addition to express distance to hyperplane 12.5 Derivation of the margin:r 376 12.6 Linearly separable and non linearly separable data 379 12.7 Soft margin SVM allows examples to be within the margin 380 12.8 The hinge loss is a convex upper bound of zero-one loss 382 12.9 Convex hulls 386 12.10 SVM with different kernels 389 C2019 M. P. Deisenroth, AA Faisal, C S. Ong. To be published by Cambridge University Press.

(系统自动生成,下载前可以参看下载内容)