Theory of Machine Learning: Lecture Schedule

The course will broadly be divided into four parts. The first is an introduction to learning theory (PAC model, VC theory). This will be followed by optimization (convexity, stochastic gradient descent, regret bounds). Next, we will discuss some old and new theoretical results on neural networks (representatability, power of depth). Finally, we will study unsupervised learning (clustering, generative models, NMF and other "factorization" techniques).

Below is the tentative schedule, and a list of readings. Once again, here is a link to the textbook. The template for scribe notes can be found here.

Statistical Learning Theory

(Jan 9, 11): Introduction to the course; Statistical learning, the PAC model of Valiant, agnostic PAC model and learnability of finite hypothesis classes.
Readings: Chapters 2, 3 & 4 of the textbook; Valiant's Theory of the Learnable.

(Jan 18): Bias/complexity tradeoff and the no free lunch theorem; Introduction to infinite hypothesis classes. Readings: Chapter 5 of the textbook. Scribe notes.

(Jan 23, 25): Infinite hypothesis classes: VC dimension; "fundamental theorem" of learning theory.
Readings: Chapter 6 of the textbook. Here is another well written (and self-contained) exposition. Scribe notes for lecture 4, and lecture 5 (figures to be added).

(Jan 30): Rademacher complexity. Review of ERM and generalization, complexity issues.
Readings: Chapter 26 and Chapter 8 of the textbook. A good source for the proof of the Rademacher bound is the following lecture notes (Rob Schapire). Scribe notes for lecture 6.

Optimization and online learning

(Feb 1): Convex optimization, gradient descent, the SGD algorithm. Reading: Chapter 14 of the textbook. Scribe notes for lecture 7.

(Feb 6, 8): Analysis of gradient descent, projections, outline of acceleration. An excellent resource for optimization is the monograph by Sebastien Bubeck. (**preliminary** scribe notes for lecture 8 and lecture 9)

(Feb 13, 15): Online learning -- framework, connections to VC dimension, regret minimization. Chapter 21 of the textbook. Another nice resource is the survey of Arora, Hazan and Kale. (**preliminary** scribe notes for lecture 10, lecture 11)

(Feb 22): Regret minimization recap, and intro to online convex optimization. Reading: chapter 21 of the textbook.

(Feb 27,
Mar 1):
Online convex optimization (contd.), Boosting. Reading: chapter 21 of the textbook, survey on multiplicative weights, and the paper of Freund and Schapire.

Neural networks and deep learning

(Mar 6, 8): Boosting and optimization wrap-up, introduction to neural networks. Reading: chapter 20 of the textbook, and the initial chapters of the new textbook on the topic. Preliminary scribe notes: lecture 15 (ignore the # in the note), and lecture 16.

(Mar 20, 22): SGD to train neural networks and back propagation. Reading: chapter 20 of the textbook. Preliminary scribe notes: lecture 17, lecture 18.

(Mar 27): Generalization in neural networks, regularization techniques. Chapters 7 and 9 of the new textbook on DNNs. Preliminary lecture notes.

Unsupervised learning

(Mar 29): Introduction to unsupervised learning, generative models, maximum likelihood. Preliminary lecture notes.

(Apr 3, 5): Clustering, dimension reduction (SVD). Reading: Chapters 8 and 3 (resp) of the recent textbook by Blum, Hopcroft and Kannan, available here.