Mathematics for Machine Learning 1st edition by Marc Peter Deisenroth, Alodo Faisal, Cheng Soon Ong – Ebook PDF Instant Download/Delivery: 110845514X, 978-1108455145
Full download Mathematics for Machine Learning 1st edition after payment

Product details:
ISBN 10: 110845514X
ISBN 13: 978-1108455145
Author: Marc Peter Deisenroth, Alodo Faisal, Cheng Soon Ong
The fundamental mathematical tools needed to understand machine learning include linear algebra, analytic geometry, matrix decompositions, vector calculus, optimization, probability and statistics. These topics are traditionally taught in disparate courses, making it hard for data science or computer science students, or professionals, to efficiently learn the mathematics. This self contained textbook bridges the gap between mathematical and machine learning texts, introducing the mathematical concepts with a minimum of prerequisites. It uses these concepts to derive four central machine learning methods: linear regression, principal component analysis, Gaussian mixture models and support vector machines. For students and others with a mathematical background, these derivations provide a starting point to machine learning texts. For those learning the mathematics for the first time, the methods help build intuition and practical experience with applying mathematical concepts. Every chapter includes worked examples and exercises to test understanding. Programming tutorials are offered on the book’s web site.
Mathematics for Machine Learning 1st Table of contents:
Part I Mathematical Foundations
1 Introduction and Motivation
1.1 Finding Words for Intuitions
1.2 Two Ways to Read This Book
1.3 Exercises and Feedback
2 Linear Algebra
2.1 Systems of Linear Equations
2.2 Matrices
2.2.1 Matrix Addition and Multiplication
2.2.2 Inverse and Transpose
2.2.3 Multiplication by a Scalar
2.2.4 Compact Representations of Systems of Linear Equations
2.3 Solving Systems of Linear Equations
2.3.1 Particular and General Solution
2.3.2 Elementary Transformations
2.3.3 The Minus-1 Trick
2.3.4 Algorithms for Solving a System of Linear Equations
2.4 Vector Spaces
2.4.1 Groups
2.4.2 Vector Spaces
2.4.3 Vector Subspaces
2.5 Linear Independence
2.6 Basis and Rank
2.6.1 Generating Set and Basis
2.6.2 Rank
2.7 Linear Mappings
2.7.1 Matrix Representation of Linear Mappings
2.7.2 Basis Change
2.7.3 Image and Kernel
2.8 Affine Spaces
2.8.1 Affine Subspaces
2.8.2 Affine Mappings
2.9 Further Reading
Exercises
3 Analytic Geometry
3.1 Norms
3.2 Inner Products
3.2.1 Dot Product
3.2.2 General Inner Products
3.2.3 Symmetric, Positive Definite Matrices
3.3 Lengths and Distances
3.4 Angles and Orthogonality
3.5 Orthonormal Basis
3.6 Orthogonal Complement
3.7 Inner Product of Functions
3.8 Orthogonal Projections
3.8.1 Projection onto One-Dimensional Subspaces (Lines)
3.8.2 Projection onto General Subspaces
3.8.3 Gram–Schmidt Orthogonalization
3.8.4 Projection onto Affine Subspaces
3.9 Rotations
3.9.1 Rotations in R[sup(2)]
3.9.2 Rotations in R[sup(3)]
3.9.3 Rotations in n Dimensions
3.9.4 Properties of Rotations
3.10 Further Reading
Exercises
4 Matrix Decompositions
4.1 Determinant and Trace
4.2 Eigenvalues and Eigenvectors
4.2.1 Graphical Intuition in Two Dimensions
4.3 Cholesky Decomposition
4.4 Eigendecomposition and Diagonalization
4.4.1 Geometric Intuition for the Eigendecomposition
4.5 Singular Value Decomposition
4.5.1 Geometric Intuitions for the SVD
4.5.2 Construction of the SVD
4.5.3 Eigenvalue Decomposition vs. Singular Value Decomposition
4.6 Matrix Approximation
4.7 Matrix Phylogeny
4.8 Further Reading
Exercises
5 Vector Calculus
5.1 Differentiation of Univariate Functions
5.1.1 Taylor Series
5.1.2 Differentiation Rules
5.2 Partial Differentiation and Gradients
5.2.1 Basic Rules of Partial Differentiation
5.2.2 Chain Rule
5.3 Gradients of Vector-Valued Functions
5.4 Gradients of Matrices
5.5 Useful Identities for Computing Gradients
5.6 Backpropagation and Automatic Differentiation
5.6.1 Gradients in a Deep Network
5.6.2 Automatic Differentiation
5.7 Higher-Order Derivatives
5.8 Linearization and Multivariate Taylor Series
5.9 Further Reading
Exercises
6 Probability and Distributions
6.1 Construction of a Probability Space
6.1.1 Philosophical Issues
6.1.2 Probability and Random Variables
6.1.3 Statistics
6.2 Discrete and Continuous Probabilities
6.2.1 Discrete Probabilities
6.2.2 Continuous Probabilities
6.2.3 Contrasting Discrete and Continuous Distributions
6.3 Sum Rule, Product Rule, and Bayes’ Theorem
6.4 Summary Statistics and Independence
6.4.1 Means and Covariances
6.4.2 Empirical Means and Covariances
6.4.3 Three Expressions for the Variance
6.4.4 Sums and Transformations of Random Variables
6.4.5 Statistical Independence
6.4.6 Inner Products of Random Variables
6.5 Gaussian Distribution
6.5.1 Marginals and Conditionals of Gaussians are Gaussians
6.5.2 Product of Gaussian Densities
6.5.3 Sums and Linear Transformations
6.5.4 Sampling from Multivariate Gaussian Distributions
6.6 Conjugacy and the Exponential Family
6.6.1 Conjugacy
6.6.2 Sufficient Statistics
6.6.3 Exponential Family
6.7 Change of Variables/Inverse Transform
6.7.1 Distribution Function Technique
6.7.2 Change of Variables
6.8 Further Reading
Exercises
7 Continuous Optimization
7.1 Optimization Using Gradient Descent
7.1.1 Step-Size
7.1.2 Gradient Descent With Momentum
7.1.3 Stochastic Gradient Descent
7.2 Constrained Optimization and Lagrange Multipliers
7.3 Convex Optimization
7.3.1 Linear Programming
7.3.2 Quadratic Programming
7.3.3 Legendre–Fenchel Transform and Convex Conjugate
7.4 Further Reading
Exercises
Part II Central Machine Learning Problems
8 When Models Meet Data
8.1 Data, Models, and Learning
8.1.1 Data as Vectors
8.1.2 Models as Functions
8.1.3 Models as Probability Distributions
8.1.4 Learning Is Finding Parameters
8.2 Empirical Risk Minimization
8.2.1 Hypothesis Class of Functions
8.2.2 Loss Function for Training
8.2.3 Regularization to Reduce Overfitting
8.2.4 Cross-Validation to Assess the Generalization Performance
8.2.5 Further Reading
8.3 Parameter Estimation
8.3.1 Maximum Likelihood Estimation
8.3.2 Maximum A Posteriori Estimation
8.3.3 Model Fitting
8.3.4 Further Reading
8.4 Probabilistic Modeling and Inference
8.4.1 Probabilistic Models
8.4.2 Bayesian Inference
8.4.3 Latent-Variable Models
8.4.4 Further Reading
8.5 Directed Graphical Models
8.5.1 Graph Semantics
8.5.2 Conditional Independence and d-Separation
8.5.3 Further Reading
8.6 Model Selection
8.6.1 Nested Cross-Validation
8.6.2 Bayesian Model Selection
8.6.3 Bayes Factors for Model Comparison
8.6.4 Further Reading
9 Linear Regression
9.1 Problem Formulation
9.2 Parameter Estimation
9.2.1 Maximum Likelihood Estimation
9.2.2 Overfitting in Linear Regression
9.2.3 Maximum A Posteriori Estimation
9.2.4 MAP Estimation as Regularization
9.3 Bayesian Linear Regression
9.3.1 Model
9.3.2 Prior Predictions
9.3.3 Posterior Distribution
9.3.4 Posterior Predictions
9.3.5 Computing the Marginal Likelihood
9.4 Maximum Likelihood as Orthogonal Projection
9.5 Further Reading
10 Dimensionality Reduction with Principal Component Analysis
10.1 Problem Setting
10.2 Maximum Variance Perspective
10.2.1 Direction with Maximal Variance
10.2.2 M-dimensional Subspace with Maximal Variance
10.3 Projection Perspective
10.3.1 Setting and Objective
10.3.2 Finding Optimal Coordinates
10.3.3 Finding the Basis of the Principal Subspace
10.4 Eigenvector Computation and Low-Rank Approximations
10.4.1 PCA Using Low-Rank Matrix Approximations
10.4.2 Practical Aspects
10.5 PCA in High Dimensions
10.6 Key Steps of PCA in Practice
10.7 Latent Variable Perspective
10.7.1 Generative Process and Probabilistic Model
10.7.2 Likelihood and Joint Distribution
10.7.3 Posterior Distribution
10.8 Further Reading
11 Density Estimation with Gaussian Mixture Models
11.1 Gaussian Mixture Model
11.2 Parameter Learning via Maximum Likelihood
11.2.1 Responsibilities
11.2.2 Updating the Means
11.2.3 Updating the Covariances
11.2.4 Updating the Mixture Weights
11.3 EM Algorithm
11.4 Latent-Variable Perspective
11.4.1 Generative Process and Probabilistic Model
11.4.2 Likelihood
11.4.3 Posterior Distribution
11.4.4 Extension to a Full Dataset
11.4.5 EM Algorithm Revisited
11.5 Further Reading
12 Classification with Support Vector Machines
12.1 Separating Hyperplanes
12.2 Primal Support Vector Machine
12.2.1 Concept of the Margin
12.2.2 Traditional Derivation of the Margin
12.2.3 Why We Can Set the Margin to 1
12.2.4 Soft Margin SVM: Geometric View
12.2.5 Soft Margin SVM: Loss Function View
12.3 Dual Support Vector Machine
12.3.1 Convex Duality via Lagrange Multipliers
12.3.2 Dual SVM: Convex Hull View
12.4 Kernels
12.5 Numerical Solution
12.6 Further Reading
People also search for Mathematics for Machine Learning 1st:
borrow mathematics for machine learning
essential mathematics for machine learning
geeks for geeks mathematics for machine learning
essential mathematics for machine learning nptel
mathematics for machine learning pdf
Tags: Marc Peter Deisenroth, Alodo Faisal, Cheng Soon Ong, Machine Learning


