CSE 512: Machine Learning
Spring 2017, Time: TuesThurs 5:30PM6:50PM, Location: Javits 101
Instructor: Francesco Orabona
Office hours: TuesThurs 11:00AM12:00PM
TA: Fatemeh Almodaresi
TA office hours: Wed 10:00AM12:00PM, Room 256 in NCS
Note: The class is currently full.
Machine Learning is centered around automated methods that improve their own performance through learning patterns in data, and then use the uncovered patterns to predict the future and make decisions. Examples include document retrieval, image classification, spam filtering, face detection, speech recognition, decision making, and robot navigation. This course covers both the theoretical and the practical side of machine learning.
All the course material will be posted on Piazza.
Prerequisites:
 Basic algorithms and data structures
 Linear algebra
 Multivariable calculus
 Probability and statistics
 Proficiency in and regular access to MATLAB
Specific Topics:
 Linear algebra + Probability
 Learning theory
 Linear predictors
 Biasvariance, model selection
 Perceptron
 Support Vector Machines
 Convexity and convex problems
 Duality and kernels
 Convex Learning
 Regularized linear predictors
 Stochastic Gradient Descent
 Boosting
 Clustering
 Dimensionality reduction
 Generative models
 Nearest Neighborhood
 Neural Networks, Backprop
 Deep learning
Text Books
The required textbook for the class is:
The book is available online
here.
Supplemental recommended books:
Requirements and Grading:
There will be roughly biweekly
homework assignments, counting
toward 50% of the grade. Assignments must be typed (not handwritten)
and submitted electronically in PDF. Collaborative work on the
homeworks is encouraged, but each student must eventually write up the
solution independently.
The midterm will count 20% and the final exam 30%.
Lecture Schedule
Updated regularly and subject to minor tweaks throughout the semester.
Date 
Topic 
Relevant Reading 
Notes 
1/24/2017 
Introduction



1/26/2017 
Review of Linear Algebra and Probability

Iain Murray's cribsheet
Matrix Cookbook


1/31/2017 
PAC Learning

UML Chapters 2,3


2/02/2017 
Agnostic PAC Learning

UML Chapter 4


2/07/2017 
Linear Regression

UML Chapter 9, Section 12.1.1

HW 1 is out 
2/09/2017 
No class


Snow Day 
2/14/2017 
BiasVariance Tradeoff and Model Selection

UML Chapter 5, Section 11.2


2/16/2017 
Perceptron

UML Section 9.1.2
CML Chapter 4


2/21/2017 
Support Vector Machines

UML Section 15.1 (and its subsections), 15.2, 12.3
Optional: Burges' SVM tutorial

HW 1 is due
HW 2 is out

2/23/2017 
Duality and Kernels 
CO, Sections 5.1.1, 5.1.2, 5.1.3, 5.2, 5.2.2, 5.2.3
UML, Sections 16.1, 16.2
Optional: Burges' SVM tutorial


2/28/2017 
Regularization

UML, Sections 13.1 (and subsections), 25.1.3, 26.3, 26.4


3/02/2017 
Convex Optimization

CML, Sections 7.4
UML, Sections 14.1, 14.2, 14.4.1
Optional: CO, Chapter 4, Sections 9.1, 9.2, 9.3


3/07/2017 
Summary lecture, pre midterm


HW 2 is due 
3/09/2017 
Midterm exam


HW 3 is out 
3/14/2017 
Spring break



3/16/2017 
Spring break



3/21/2017 
Stochastic Gradient Descent

UML, Sections 12.1.2, 12.1.3, 12.2.2, 14.1, 14.3, 14.5
Optional: Zhang's paper on SGD


3/23/2017 
Multiclass Classification

UML, Sections 17.1, 17.2, 17.3 and their subsections


3/28/2017 
Boosting


HW 3 is due 
3/30/2017 
Principal Component Analysis


HW 4 is out 
4/04/2017 
Clustering



4/06/2017 
Generative Models



4/11/2017 
Nearest Neighborhood



4/13/2017 
Neural Networks 1


HW 4 is due 
4/18/2017 
Neural Networks 2


HW 5 is out 
4/20/2017 
Online Learning 1



4/25/2017 
Online Learning 2



4/27/2017 
Summary



5/02/2017 


HW 5 is due 
5/04/2017 
Endterm exam



Misc
Note: If you have a physical, psychological, medical or learning disability that may impact on your ability to carry out assigned course work, please contact the staff in the Disabled Student Services office (DSS), Room 133, Humanities, 6326748/TDD. DSS will review your concerns and determine with you what accommodations are necessary and appropriate. All information and documentation of disability are confidential.
Note: Each student must pursue his or her academic goals honestly and be personally accountable for all submitted work. Representing another person's work as your own is always wrong. Any suspected instance of academic dishonesty will be reported to the Academic Judiciary. For more comprehensive information on academic integrity, including categories of academic dishonesty, please refer to the academic judiciary website at http://www.stonybrook.edu/uaa/academicjudiciary/.
TTIC 31070 (CMSC 34500 Optimization/ BUSF 36903): Convex Optimization
This is a webpage for the Fall 2013 course at TTIC and the University of Chicago.
Course Description
Mondays and Thursday 3:00pm4:20pm at TTIC 530 (located at
6045 S. Kenwood Ave, fifth floor)
Instructor: Francesco Orabona
Additional Lecturer: Nati Srebro (first 3 lessons)
TA: Zhiyong Wang
Mailing list of the course:
The course will cover techniques in unconstrained and constrained
convex optimization and a practical introduction to convex duality.
The course will focus on (1) formulating and understanding convex
optimization problems and studying their properties; (2) presenting
and understanding optimization approaches; and (3) understanding the
dual problem. Limited theoretical analysis of convergence properties
of methods will be presented. Examples will be mostly from data
fitting, statistics and machine learning.
Specific Topics:
 Formalization of optimization problems
 Smooth unconstrained optimization: Gradient Descent, Conjugate Gradient Descent, Newton and QuasiNewton methods; Line Search methods
 Standard formulations of constrained optimization: Linear, Quadratic, Conic and SemiDefinite Programming
 KKT optimality conditions
 Lagrangian duality, constraint qualification, weak and strong duality
 Fenchel conjugacy and its relationship to Lagrangian duality
 Multiobjective optimization
 Equality constrained Newton method
 Log Barrier (Central Path) methods, and PrimalDual optimization methods
 Overview of the simplex method as an example of an active set method.
 Overview of the first order oracle model, subgradient and separation oracles, and the Ellipsoid algorithm.
 Largescale subgradient methods: subgradient descent, mirror descent, stochastic subgradient descent.
Text Books
The required textbook for the class is:
The book is available online
here. About 75% of the material covered
in the class can be found in the above book.
Supplemental recommended books:
Requirements and Grading:
There will be roughly biweekly
homework assignments, counting
toward 30% of the grade. Assignments must be typed (not handwritten)
and submitted electronically in PDF. Collaborative work on the
homeworks is encouraged, but each student must eventually write up the
solution independently.
There will also be several MATLAB programming and experimentation
assignments, counting toward another 20% of the grade.
The remaining 50% of the grade will be based on a final exam.
Lectures and Required Reading:
 Lecture I: Monday September 30th

 Intro to course
 Formulation of optimization problems
 Feasibility, optimality, suboptimality
Boyd and Vandenberghe Sections
1.11.4
 Lecture II: Thursday October 3rd

 Affine & convex sets, and operations that preserve convex sets
 Affine & convex Functions
 Separating and supporting hyperplanes, halfspaces, polyhedra
 Notions of separation
 Alphasublevel sets, epigraphs
Boyd and Vandenberghe Sections 2.12.3, 2.5, 3.13.2
 Lecture III: Monday October 6th

 Unconstrained optimization: descent methods; descent direction
 Gradient descent
 Line search: exact line search and backtracking line search
 Analysis of gradient descent with exact and backtracking line search
 Problems with badly conditioned functions; preconditioning
 Newton's method
Boyd and Vandenberghe Sections 9.19.3, 9.5
 Lecture IV: Thursday October 10th

 Analysis of Newton's method
 Selfconcordance analysis of Newton's method
 Conjugate gradient descent
 QuasiNewton methods: BFGS
 Summary of unconstrained optimization
Boyd and Vandenberghe Sections 9.59.6, Nocedal and Wright Sections 5.1, 6.1
 Lecture V: Monday October 14th

 Constrained Optimization: Formulation and Optimality Condition
 Linear Programming
 The Lagrangian, the dual problem, weak duality
 Slater's condition and strong duality
 Summary of unconstrained optimization
 Dual solutions as certificates for suboptimality and infeasibility
Boyd and Vandenberghe Sections 4.2,4.3,5.1,5.2,5.4.1,5.5.1
 Lecture VI: Thursday October 17th

 Geometric interpretation of duality; Slater's condition
 Dual of a linear program
 Dual of a nonconvex problem: maxcut clustering
 Dual of leastsquares solution to underdetermined linear system
 Leastsquares solution: recovering the primal optimum from the dual optimum
 Complimentary slackness
Boyd and Vandenberghe Sections 5.3.1, 5.5.2
 Lecture VII: Monday October 21st

 Example: Maxflow/mincut
 KKT conditions
 Examples: waterfiling, largemargin linear discrimination.
 Expressing the dual in terms of Fenchel conjugates
Boyd and Vandenberghe Sections 5.5.3, 3.3, 5.1.6, 5.7.1
 Lecture VIII: Thursday October 24th

 Example: Logistic regression
 Example: Minimum volume covering ellipsoid and the logdet function
 Matrix inequalities, semidefinite constraints, and semidefinite programming
 Examples: covariance estimation, fastest mixing Markov chain
 Lagrangian duality for semidefinite constraints
 Complimentary slackness and KKT optimality conditions for semidefinite constraints
 Normconstrained matrix factorization as an example of an SDP and its dual
Boyd and Vandenberghe Sections 4.6, 5.9
 Lecture IX: Monday October 28th

 Equality constrained Newton's method
 Optimization of problems with implicit constraints
 Logbarrier problems and the central path interior point method
 Analysis of the logbarrier central path method
 Logbarrier method for semidefinite constraints
Boyd and Vandenberghe Sections 10.1, 10.2, 11.1, 11.2, 11.3.1, 11.5, 11.6
 Lecture X: Thursday October 31st

 Feasibility problems and Phase I methods
 Reducing Optimization to feasibility
 Logbarrier and weakened KKT conditions
 PrimalDual Interior Point Methods, including infeasible start
 Multiobjective problems and Pareto optimality
Boyd and Vandenberghe Sections 11.4.1, 11.4.3, 10.3.4, 11.7, 4.7
 Lecture XI: Monday November 4th

 Scalarization of multiobjective problems
 The Simplex method
Boyd and Vandenberghe Sections 11.4.1, Nocedal and Wright Chapter 13
 Lecture XII: Thursday November 7th

 The first order oracle model: subgradients and seperation oracles
 Classical first order methods: centerofmass method, oracle lower bound, ellipsoid method
Chapters 13 of Nemirovksi's "Efficient Methods in Convex Programming"
 Lecture XIII: Monday November 11th

 Review and limitations of methods with polynomial convergence.
 Largescale, slowly converging, first order methods.
 Subgradient descent with projection, step size and analysis for Lipschitz functions over a bounded domain.
Section 5.3.1 of Nemirovksi's "Lectures on Modern Convex Optimization"
 Lecture XIV: Thursday November 14th

 Gradient descent as a proximal point method.
 From gradient descent to bundle methods.
 Mirror Descent formulation and basic concepts: strong convexity with respect to a norm, dual norm, distance generating function and Bergman divergence
 Analysis Mirror Descent
Sections 5.3.2 and 5.4.2 of Nemirovksi's "Lectures on Modern Convex Optimization"
 Lecture XV: Monday November 18th

 Partial linarization in Mirror Descent
 More about subgradient descent: decreasing stepsize
 Primaldual analysis of Mirror Descent: Dual Averaging
 Choice of the distance generating function
 Lecture XVI: Thursday November 21st

 Stochastic Gradient Descent and Stochastic Optimization.
 Overview of further results for first order methods: faster rates with strong convexity, faster rates with smoothness, acceleration with smoothness.
Section 14.1 of Nemirovksi's "Efficient Methods in Convex Programming"
 Lecture XVII: Monday December 2nd

Assignments: