Francesco Orabona


CSE 512: Machine Learning

Spring 2017, Time: Tues-Thurs 5:30PM-6:50PM, Location: Javits 101

Instructor: Francesco Orabona

Office hours: Tues-Thurs 11:00AM-12:00PM

TA: Fatemeh Almodaresi

TA office hours: Wed 10:00AM-12:00PM, Room 256 in NCS

Note: The class is currently full.

Machine Learning is centered around automated methods that improve their own performance through learning patterns in data, and then use the uncovered patterns to predict the future and make decisions. Examples include document retrieval, image classification, spam filtering, face detection, speech recognition, decision making, and robot navigation. This course covers both the theoretical and the practical side of machine learning.

All the course material will be posted on Piazza.

Prerequisites:

  • Basic algorithms and data structures
  • Linear algebra
  • Multivariable calculus
  • Probability and statistics
  • Proficiency in and regular access to MATLAB

Specific Topics:

  • Linear algebra + Probability
  • Learning theory
  • Linear predictors
  • Bias-variance, model selection
  • Perceptron
  • Support Vector Machines
  • Convexity and convex problems
  • Duality and kernels
  • Convex Learning
  • Regularized linear predictors
  • Stochastic Gradient Descent
  • Boosting
  • Clustering
  • Dimensionality reduction
  • Generative models
  • Nearest Neighborhood
  • Neural Networks, Backprop
  • Deep learning

Text Books

The required textbook for the class is: The book is available online here.

Supplemental recommended books:

Requirements and Grading:

There will be roughly bi-weekly homework assignments, counting toward 50% of the grade. Assignments must be typed (not handwritten) and submitted electronically in PDF. Collaborative work on the homeworks is encouraged, but each student must eventually write up the solution independently.

The midterm will count 20% and the final exam 30%.

Lecture Schedule

Updated regularly and subject to minor tweaks throughout the semester.

Date Topic Relevant Reading Notes
1/24/2017 Introduction
1/26/2017 Review of Linear Algebra and Probability
Iain Murray's crib-sheet
Matrix Cookbook
1/31/2017 PAC Learning
UML Chapters 2,3
2/02/2017 Agnostic PAC Learning
UML Chapter 4
2/07/2017 Linear Regression
UML Chapter 9, Section 12.1.1
HW 1 is out
2/09/2017 No class

Snow Day
2/14/2017 Bias-Variance Trade-off and Model Selection
UML Chapter 5, Section 11.2
2/16/2017 Perceptron
UML Section 9.1.2
CML Chapter 4
2/21/2017 Support Vector Machines
UML Section 15.1 (and its subsections), 15.2, 12.3
Optional: Burges' SVM tutorial
HW 1 is due
HW 2 is out
2/23/2017 Duality and Kernels CO, Sections 5.1.1, 5.1.2, 5.1.3, 5.2, 5.2.2, 5.2.3
UML, Sections 16.1, 16.2
Optional: Burges' SVM tutorial
2/28/2017 Regularization
UML, Sections 13.1 (and subsections), 25.1.3, 26.3, 26.4
3/02/2017 Convex Optimization
CML, Sections 7.4
UML, Sections 14.1, 14.2, 14.4.1
Optional: CO, Chapter 4, Sections 9.1, 9.2, 9.3
3/07/2017 Summary lecture, pre mid-term

HW 2 is due
3/09/2017 Mid-term exam

HW 3 is out
3/14/2017 Spring break

3/16/2017 Spring break

3/21/2017 Stochastic Gradient Descent
UML, Sections 12.1.2, 12.1.3, 12.2.2, 14.1, 14.3, 14.5
Optional: Zhang's paper on SGD
3/23/2017 Multiclass Classification
UML, Sections 17.1, 17.2, 17.3 and their subsections
3/28/2017 Boosting
CML, Section 13.2
Introduction (from this book)
HW 3 is due
3/30/2017 Clustering
CML, Sections 3.4, 15.1
4/04/2017 Dimensionality Reduction
UML, Sections 23.1, 23.2
4/06/2017 Generative Models
UML, Sections 24.1, 24.2, 24.3, 24.4
HW4 is out
4/11/2017 Nearest Neighborhood and Decision Trees
UML, Sections 18.1, 18.2, 19.1, 19.3
4/13/2017 Neural Networks and Backprop
UML, Sections 20.1, 20.2, 20.3.1, 20.5, 20.6
4/18/2017 Deep Learning Overview
Optional: Deep Learning Book
4/20/2017 Online Learning 1
UML, Sections 21.1, 21.2
Optional: Online Learning survey
HW 4 is due
4/25/2017 Online Learning, second part
UML, Sections 21.3
Optional: Online Learning survey
4/27/2017 Summary

5/02/2017

Kaggle ends
5/04/2017 End-term exam


Misc

Note: If you have a physical, psychological, medical or learning disability that may impact on your ability to carry out assigned course work, please contact the staff in the Disabled Student Services office (DSS), Room 133, Humanities, 632-6748/TDD. DSS will review your concerns and determine with you what accommodations are necessary and appropriate. All information and documentation of disability are confidential.

Note: Each student must pursue his or her academic goals honestly and be personally accountable for all submitted work. Representing another person's work as your own is always wrong. Any suspected instance of academic dishonesty will be reported to the Academic Judiciary. For more comprehensive information on academic integrity, including categories of academic dishonesty, please refer to the academic judiciary website at http://www.stonybrook.edu/uaa/academicjudiciary/.

TTIC 31070 (CMSC 34500 Optimization/ BUSF 36903): Convex Optimization

This is a webpage for the Fall 2013 course at TTIC and the University of Chicago.

Course Description

Mondays and Thursday 3:00pm-4:20pm at TTIC 530 (located at 6045 S. Kenwood Ave, fifth floor)

Instructor: Francesco Orabona

Additional Lecturer: Nati Srebro (first 3 lessons)

TA: Zhiyong Wang

Mailing list of the course: The course will cover techniques in unconstrained and constrained convex optimization and a practical introduction to convex duality. The course will focus on (1) formulating and understanding convex optimization problems and studying their properties; (2) presenting and understanding optimization approaches; and (3) understanding the dual problem. Limited theoretical analysis of convergence properties of methods will be presented. Examples will be mostly from data fitting, statistics and machine learning.

Specific Topics:

  • Formalization of optimization problems
  • Smooth unconstrained optimization: Gradient Descent, Conjugate Gradient Descent, Newton and Quasi-Newton methods; Line Search methods
  • Standard formulations of constrained optimization: Linear, Quadratic, Conic and Semi-Definite Programming
  • KKT optimality conditions
  • Lagrangian duality, constraint qualification, weak and strong duality
  • Fenchel conjugacy and its relationship to Lagrangian duality
  • Multi-objective optimization
  • Equality constrained Newton method
  • Log Barrier (Central Path) methods, and Primal-Dual optimization methods
  • Overview of the simplex method as an example of an active set method.
  • Overview of the first order oracle model, subgradient and separation oracles, and the Ellipsoid algorithm.
  • Large-scale subgradient methods: subgradient descent, mirror descent, stochastic subgradient descent.

Text Books

The required textbook for the class is: The book is available online here. About 75% of the material covered in the class can be found in the above book.

Supplemental recommended books:

Requirements and Grading:

There will be roughly bi-weekly homework assignments, counting toward 30% of the grade. Assignments must be typed (not handwritten) and submitted electronically in PDF. Collaborative work on the homeworks is encouraged, but each student must eventually write up the solution independently.

There will also be several MATLAB programming and experimentation assignments, counting toward another 20% of the grade.

The remaining 50% of the grade will be based on a final exam.

Lectures and Required Reading:

Lecture I: Monday September 30th
  • Intro to course
  • Formulation of optimization problems
  • Feasibility, optimality, sub-optimality
Boyd and Vandenberghe Sections 1.1-1.4
Lecture II: Thursday October 3rd
  • Affine & convex sets, and operations that preserve convex sets
  • Affine & convex Functions
  • Separating and supporting hyperplanes, halfspaces, polyhedra
  • Notions of separation
  • Alpha-sublevel sets, epigraphs
Boyd and Vandenberghe Sections 2.1-2.3, 2.5, 3.1-3.2
Lecture III: Monday October 6th
  • Unconstrained optimization: descent methods; descent direction
  • Gradient descent
  • Line search: exact line search and backtracking line search
  • Analysis of gradient descent with exact and backtracking line search
  • Problems with badly conditioned functions; preconditioning
  • Newton's method
Boyd and Vandenberghe Sections 9.1-9.3, 9.5
Lecture IV: Thursday October 10th
  • Analysis of Newton's method
  • Self-concordance analysis of Newton's method
  • Conjugate gradient descent
  • Quasi-Newton methods: BFGS
  • Summary of unconstrained optimization
Boyd and Vandenberghe Sections 9.5-9.6, Nocedal and Wright Sections 5.1, 6.1
Lecture V: Monday October 14th
  • Constrained Optimization: Formulation and Optimality Condition
  • Linear Programming
  • The Lagrangian, the dual problem, weak duality
  • Slater's condition and strong duality
  • Summary of unconstrained optimization
  • Dual solutions as certificates for sub-optimality and infeasibility
Boyd and Vandenberghe Sections 4.2,4.3,5.1,5.2,5.4.1,5.5.1
Lecture VI: Thursday October 17th
  • Geometric interpretation of duality; Slater's condition
  • Dual of a linear program
  • Dual of a non-convex problem: max-cut clustering
  • Dual of least-squares solution to under-determined linear system
  • Least-squares solution: recovering the primal optimum from the dual optimum
  • Complimentary slackness
Boyd and Vandenberghe Sections 5.3.1, 5.5.2
Lecture VII: Monday October 21st
  • Example: Max-flow/min-cut
  • KKT conditions
  • Examples: water-filing, large-margin linear discrimination.
  • Expressing the dual in terms of Fenchel conjugates
Boyd and Vandenberghe Sections 5.5.3, 3.3, 5.1.6, 5.7.1
Lecture VIII: Thursday October 24th
  • Example: Logistic regression
  • Example: Minimum volume covering ellipsoid and the log-det function
  • Matrix inequalities, semi-definite constraints, and semi-definite programming
  • Examples: covariance estimation, fastest mixing Markov chain
  • Lagrangian duality for semi-definite constraints
  • Complimentary slackness and KKT optimality conditions for semi-definite constraints
  • Norm-constrained matrix factorization as an example of an SDP and its dual
Boyd and Vandenberghe Sections 4.6, 5.9
Lecture IX: Monday October 28th
  • Equality constrained Newton's method
  • Optimization of problems with implicit constraints
  • Log-barrier problems and the central path interior point method
  • Analysis of the log-barrier central path method
  • Log-barrier method for semi-definite constraints
Boyd and Vandenberghe Sections 10.1, 10.2, 11.1, 11.2, 11.3.1, 11.5, 11.6
Lecture X: Thursday October 31st
  • Feasibility problems and Phase I methods
  • Reducing Optimization to feasibility
  • Log-barrier and weakened KKT conditions
  • Primal-Dual Interior Point Methods, including infeasible start
  • Multi-objective problems and Pareto optimality
Boyd and Vandenberghe Sections 11.4.1, 11.4.3, 10.3.4, 11.7, 4.7
Lecture XI: Monday November 4th
  • Scalarization of multi-objective problems
  • The Simplex method
Boyd and Vandenberghe Sections 11.4.1, Nocedal and Wright Chapter 13
Lecture XII: Thursday November 7th
  • The first order oracle model: subgradients and seperation oracles
  • Classical first order methods: center-of-mass method, oracle lower bound, ellipsoid method
Chapters 1-3 of Nemirovksi's "Efficient Methods in Convex Programming"
Lecture XIII: Monday November 11th
  • Review and limitations of methods with polynomial convergence.
  • Large-scale, slowly converging, first order methods.
  • Sub-gradient descent with projection, step size and analysis for Lipschitz functions over a bounded domain.
Section 5.3.1 of Nemirovksi's "Lectures on Modern Convex Optimization"
Lecture XIV: Thursday November 14th
  • Gradient descent as a proximal point method.
  • From gradient descent to bundle methods.
  • Mirror Descent formulation and basic concepts: strong convexity with respect to a norm, dual norm, distance generating function and Bergman divergence
  • Analysis Mirror Descent
Sections 5.3.2 and 5.4.2 of Nemirovksi's "Lectures on Modern Convex Optimization"
Lecture XV: Monday November 18th
  • Partial linarization in Mirror Descent
  • More about sub-gradient descent: decreasing step-size
  • Primal-dual analysis of Mirror Descent: Dual Averaging
  • Choice of the distance generating function
Lecture XVI: Thursday November 21st
  • Stochastic Gradient Descent and Stochastic Optimization.
  • Overview of further results for first order methods: faster rates with strong convexity, faster rates with smoothness, acceleration with smoothness.
Section 14.1 of Nemirovksi's "Efficient Methods in Convex Programming"
Lecture XVII: Monday December 2nd
    Course Summary

Assignments: