This book studies mathematical theories of machine learning. The first part of the book explores the optimality and adaptivity of choosing step sizes of gradient descent for escaping strict saddle points in non-convex optimization problems. In the second part, the authors propose algorithms to find local minima in nonconvex optimization and to obtain global minima in some degree from the Newton Second Law without friction. In the third part, the authors study the problem of subspace clustering with noisy and missing data, which is a problem well-motivated by practical applications data subject to stochastic Gaussian noise and/or incomplete data with uniformly missing entries. In the last part, the authors introduce an novel VAR model with Elastic-Net regularization and its equivalent Bayesian model allowing for both a stable sparsity and a group selection.