- Fall 2019, BU, EC 500: Introduction to Online Learning, lecture notes available on my blog
- Spring 2019, BU, EC 503: Introduction to Learning from Data
- Spring 2018, SBU, CSE 592: Convex Optimization
- Spring 2017, SBU, CSE 512: Machine Learning
- Fall 2013, TTIC 31070 (CMSC 34500 Optimization/ BUSF 36903): Convex Optimization
CSE 592: Convex Optimization
Spring 2018, Time: Tues-Thurs 1:00PM-2:20PM, Location: Javits 103
Instructor: Francesco Orabona
Office hours: Tues-Thurs, 2:30PM-3:30PM
TA: Shahira Abousamra, office hours: Tues, 11:45AM-12:45PM, NCS 140
TA: Timothy Zhang, office hours: Weds, 1:00PM-2:00PM, NCS 230
Piazza: piazza.com/stonybrook/spring2018/cse592/home
The course will cover techniques in unconstrained and constrained convex optimization and a practical introduction to convex duality. The course will focus on (1) formulating and understanding convex optimization problems and studying their properties; (2) presenting and understanding optimization approaches; (3) understanding the dual problem; and (4) stochastic optimization methods. Theoretical analysis of convergence properties of methods will be presented. Examples will be mostly from data fitting, statistics, and machine learning.
All the course material will be posted on Piazza.
Prerequisites:
- Linear algebra
- Multivariable calculus
- Proficiency in and regular access to Python
Specific Topics:
- Formalization of optimization problems
- Smooth unconstrained optimization: Gradient Descent, Conjugate Gradient Descent, Newton and Quasi-Newton methods; Line Search methods
- Standard formulations of constrained optimization: Linear, Quadratic, Conic and Semi-Definite Programming
- KKT optimality conditions
- Lagrangian duality, constraint qualification, weak and strong duality
- Fenchel conjugacy and its relationship to Lagrangian duality
- Multi-objective optimization
- Equality constrained Newton method
- Log Barrier (Central Path) methods, and Primal-Dual optimization methods
- Overview of the simplex method as an example of an active set method.
- Overview of the first order oracle model, subgradient and separation oracles, and the Ellipsoid algorithm.
- Large-scale subgradient methods: subgradient descent, mirror descent, stochastic subgradient descent.
Text Books
The required textbook for the class is:
The book is available online
here. About 75% of the material covered
in the class can be found in the above book.
Supplemental recommended books:
Requirements and Grading
There will be roughly 4
homework assignments, counting toward 20% of the grade. Assignments must be typed in LaTex or Word (not handwritten) and submitted electronically in PDF or coded in Python.
The midterm will count 15%, the second midterm 15% and the group project 40%, and the quizzes in class 10%.
Lecture Schedule
Tentative schedule, it will be updated regularly throughout the semester.
Date |
Topic |
Relevant Reading |
Notes |
1/23/2017 |
1 Introduction
|
[CO] Sections 1.1-1.4 |
|
1/25/2017 |
2 Review of Linear Algebra and Multivariable Calculus
|
Iain Murray's crib-sheet
Matrix Cookbook
|
|
1/30/2017 |
3 Convex Analysis
|
[CO] Sections 2.1-2.3, 3.1-3.2 |
|
2/01/2017 |
4 Gradient Descent, Exact search, Backtracking
|
[CO] Sections 9.1-9.3
|
|
2/06/2017 |
5 Pre-conditioning, Newton method
|
[CO] Sections 9.5-9.6
|
HW1 out |
2/08/2017 |
6 Conjugate Gradient, Quasi-Newton method
|
[NO] Sections 5.1, 6.1
|
|
2/13/2017 |
7 Constrained Optimization
|
[CO] Sections 4.2.3,4.2.4,4.3,5.1.1-5.1.4,5.2,5.4.1,5.5.1
|
|
2/15/2017 |
8 Duality
|
[CO] Sections 5.5.2
|
|
2/20/2017 |
9 KKT Optimality Conditions and Fenchel Conjugate
|
[CO] Sections 5.5.3, 3.3, 5.1.6 |
HW 1 due HW 2 out |
2/22/2017 |
10 Semi-Definite Programming |
[CO] Sections 4.6, 5.9 |
|
2/27/2017 |
11 Equality Constrained Optimization and Interior Point methods
|
[CO] Sections 10.1, 10.2, 11.1, 11.2, 11.3.1, 11.5
|
|
3/01/2017 |
12 Class cancelled |
|
|
3/06/2017 |
13 Summary lecture, pre mid-term
|
|
HW2 due |
3/08/2017 |
Mid-term exam
|
|
|
3/13/2017 |
Spring break
|
|
|
3/15/2017 |
Spring break
|
|
|
3/20/2017 |
14 Feasibility problems, Phase I methods, and Primal-dual method
|
[CO] Sections 11.4.1, 11.3.4, 11.7 |
Deadline proposal group project |
3/22/2017 |
15 Pareto Optimality and the Simplex method
|
[CO] Section 4.7
[NO] Chapter 13
|
|
3/27/2017 |
16 Center of Mass and Ellipsoid method
|
[EMCP] Chapters 2-3 |
HW3 out |
3/29/2017 |
17 Sub-gradient descent
|
[LNMCO] Section 5.3.1 |
|
4/03/2017 |
18 Mirror Descent |
[LNMCO] Sections 5.3.2, 5.4.1, 5.4.2 |
|
4/05/2017 |
19 Dual Averaging, Stochastic Gradient Descent
|
[LNMCO] Section 5.4.2
[EMCP] Section 14.1
Optional: FTRL |
|
4/10/2017 |
20 Subgradients and AdaGrad |
Original AdaGrad paper
Coordinate-wise online optimization
Subgradients
| HW3 due
HW4 out |
4/12/2017 |
21 Proximal Methods and Acceleration: FISTA
|
FISTA paper |
|
4/17/2017 |
22 ADMM
|
Boyd's monograph |
1 page report on project |
4/19/2017 |
23 Non-Convex Unconstrained Optimization Problems
|
[NO] Section 3.4
Non-convex SGD |
|
4/24/2017 |
Summary
|
|
HW4 due |
4/26/2017 |
Second Midterm
|
|
|
5/01/2017 |
Project Presentation
|
|
|
5/03/2017 |
Project Presentation
|
|
|
Misc
Note: If you have a physical, psychological, medical or learning disability that may impact on your ability to carry out assigned course work, please contact the staff in the Disabled Student Services office (DSS), Room 133, Humanities, 632-6748/TDD. DSS will review your concerns and determine with you what accommodations are necessary and appropriate. All information and documentation of disability are confidential.
Note: Each student must pursue his or her academic goals honestly and be personally accountable for all submitted work. Representing another person's work as your own is always wrong. Any suspected instance of academic dishonesty will be reported to the Academic Judiciary. For more comprehensive information on academic integrity, including categories of academic dishonesty, please refer to the academic judiciary website at http://www.stonybrook.edu/uaa/academicjudiciary/.
CSE 512: Machine Learning
Spring 2017, Time: Tues-Thurs 5:30PM-6:50PM, Location: Javits 101
Instructor: Francesco Orabona
Office hours: Tues-Thurs 11:00AM-12:00PM
TA: Fatemeh Almodaresi
TA office hours: Wed 10:00AM-12:00PM, Room 256 in NCS
Note: The class is currently full.
Machine Learning is centered around automated methods that improve their own performance through learning patterns in data, and then use the uncovered patterns to predict the future and make decisions. Examples include document retrieval, image classification, spam filtering, face detection, speech recognition, decision making, and robot navigation. This course covers both the theoretical and the practical side of machine learning.
All the course material will be posted on Piazza.
Prerequisites:
- Basic algorithms and data structures
- Linear algebra
- Multivariable calculus
- Probability and statistics
- Proficiency in and regular access to MATLAB
Specific Topics:
- Linear algebra + Probability
- Learning theory
- Linear predictors
- Bias-variance, model selection
- Perceptron
- Support Vector Machines
- Convexity and convex problems
- Duality and kernels
- Convex Learning
- Regularized linear predictors
- Stochastic Gradient Descent
- Boosting
- Clustering
- Dimensionality reduction
- Generative models
- Nearest Neighborhood
- Neural Networks, Backprop
- Deep learning
Text Books
The required textbook for the class is:
The book is available online
here.
Supplemental recommended books:
Requirements and Grading:
There will be roughly bi-weekly
homework assignments, counting
toward 50% of the grade. Assignments must be typed (not handwritten)
and submitted electronically in PDF. Collaborative work on the
homeworks is encouraged, but each student must eventually write up the
solution independently.
The midterm will count 20% and the final exam 30%.
Lecture Schedule
Updated regularly and subject to minor tweaks throughout the semester.
Date |
Topic |
Relevant Reading |
Notes |
1/24/2017 |
Introduction
|
|
|
1/26/2017 |
Review of Linear Algebra and Probability
|
Iain Murray's crib-sheet
Matrix Cookbook
|
|
1/31/2017 |
PAC Learning
|
UML Chapters 2,3
|
|
2/02/2017 |
Agnostic PAC Learning
|
UML Chapter 4
|
|
2/07/2017 |
Linear Regression
|
UML Chapter 9, Section 12.1.1
|
HW 1 is out |
2/09/2017 |
No class
|
|
Snow Day |
2/14/2017 |
Bias-Variance Trade-off and Model Selection
|
UML Chapter 5, Section 11.2
|
|
2/16/2017 |
Perceptron
|
UML Section 9.1.2
CML Chapter 4
|
|
2/21/2017 |
Support Vector Machines
|
UML Section 15.1 (and its subsections), 15.2, 12.3
Optional: Burges' SVM tutorial
|
HW 1 is due
HW 2 is out
|
2/23/2017 |
Duality and Kernels |
CO, Sections 5.1.1, 5.1.2, 5.1.3, 5.2, 5.2.2, 5.2.3
UML, Sections 16.1, 16.2
Optional: Burges' SVM tutorial
|
|
2/28/2017 |
Regularization
|
UML, Sections 13.1 (and subsections), 25.1.3, 26.3, 26.4
|
|
3/02/2017 |
Convex Optimization
|
CML, Sections 7.4
UML, Sections 14.1, 14.2, 14.4.1
Optional: CO, Chapter 4, Sections 9.1, 9.2, 9.3
|
|
3/07/2017 |
Summary lecture, pre mid-term
|
|
HW 2 is due |
3/09/2017 |
Mid-term exam
|
|
HW 3 is out |
3/14/2017 |
Spring break
|
|
|
3/16/2017 |
Spring break
|
|
|
3/21/2017 |
Stochastic Gradient Descent
|
UML, Sections 12.1.2, 12.1.3, 12.2.2, 14.1, 14.3, 14.5
Optional: Zhang's paper on SGD
|
|
3/23/2017 |
Multiclass Classification
|
UML, Sections 17.1, 17.2, 17.3 and their subsections
|
|
3/28/2017 |
Boosting
|
CML, Section 13.2
Introduction (from this book)
|
HW 3 is due |
3/30/2017 |
Clustering
|
CML, Sections 3.4, 15.1
|
|
4/04/2017 |
Dimensionality Reduction
|
UML, Sections 23.1, 23.2
|
|
4/06/2017 |
Generative Models
|
UML, Sections 24.1, 24.2, 24.3, 24.4
|
HW4 is out |
4/11/2017 |
Nearest Neighborhood and Decision Trees
|
UML, Sections 18.1, 18.2, 19.1, 19.3
|
|
4/13/2017 |
Neural Networks and Backprop
|
UML, Sections 20.1, 20.2, 20.3.1, 20.5, 20.6
|
|
4/18/2017 |
Deep Learning Overview
|
Optional: Deep Learning Book
|
|
4/20/2017 |
Online Learning 1
|
UML, Sections 21.1, 21.2
Optional: Online Learning survey
|
HW 4 is due |
4/25/2017 |
Online Learning, second part
|
UML, Sections 21.3
Optional: Online Learning survey
|
|
4/27/2017 |
Summary
|
|
|
5/02/2017 |
Talk on Parameter-free Online Learning
|
Optional: NIPS'16
|
Kaggle ends |
5/04/2017 |
End-term exam
|
|
|
Misc
Note: If you have a physical, psychological, medical or learning disability that may impact on your ability to carry out assigned course work, please contact the staff in the Disabled Student Services office (DSS), Room 133, Humanities, 632-6748/TDD. DSS will review your concerns and determine with you what accommodations are necessary and appropriate. All information and documentation of disability are confidential.
Note: Each student must pursue his or her academic goals honestly and be personally accountable for all submitted work. Representing another person's work as your own is always wrong. Any suspected instance of academic dishonesty will be reported to the Academic Judiciary. For more comprehensive information on academic integrity, including categories of academic dishonesty, please refer to the academic judiciary website at http://www.stonybrook.edu/uaa/academicjudiciary/.
TTIC 31070 (CMSC 34500 Optimization/ BUSF 36903): Convex Optimization
This is a webpage for the Fall 2013 course at TTIC and the University of Chicago.
Course Description
Mondays and Thursday 3:00pm-4:20pm at TTIC 530 (located at
6045 S. Kenwood Ave, fifth floor)
Instructor: Francesco Orabona
Additional Lecturer: Nati Srebro (first 3 lessons)
TA: Zhiyong Wang
Mailing list of the course:
The course will cover techniques in unconstrained and constrained
convex optimization and a practical introduction to convex duality.
The course will focus on (1) formulating and understanding convex
optimization problems and studying their properties; (2) presenting
and understanding optimization approaches; and (3) understanding the
dual problem. Limited theoretical analysis of convergence properties
of methods will be presented. Examples will be mostly from data
fitting, statistics and machine learning.
Specific Topics:
- Formalization of optimization problems
- Smooth unconstrained optimization: Gradient Descent, Conjugate Gradient Descent, Newton and Quasi-Newton methods; Line Search methods
- Standard formulations of constrained optimization: Linear, Quadratic, Conic and Semi-Definite Programming
- KKT optimality conditions
- Lagrangian duality, constraint qualification, weak and strong duality
- Fenchel conjugacy and its relationship to Lagrangian duality
- Multi-objective optimization
- Equality constrained Newton method
- Log Barrier (Central Path) methods, and Primal-Dual optimization methods
- Overview of the simplex method as an example of an active set method.
- Overview of the first order oracle model, subgradient and separation oracles, and the Ellipsoid algorithm.
- Large-scale subgradient methods: subgradient descent, mirror descent, stochastic subgradient descent.
Text Books
The required textbook for the class is:
The book is available online
here. About 75% of the material covered
in the class can be found in the above book.
Supplemental recommended books:
Requirements and Grading:
There will be roughly bi-weekly
homework assignments, counting
toward 30% of the grade. Assignments must be typed (not handwritten)
and submitted electronically in PDF. Collaborative work on the
homeworks is encouraged, but each student must eventually write up the
solution independently.
There will also be several MATLAB programming and experimentation
assignments, counting toward another 20% of the grade.
The remaining 50% of the grade will be based on a final exam.
Lectures and Required Reading:
- Lecture I: Monday September 30th
-
- Intro to course
- Formulation of optimization problems
- Feasibility, optimality, sub-optimality
Boyd and Vandenberghe Sections
1.1-1.4
- Lecture II: Thursday October 3rd
-
- Affine & convex sets, and operations that preserve convex sets
- Affine & convex Functions
- Separating and supporting hyperplanes, halfspaces, polyhedra
- Notions of separation
- Alpha-sublevel sets, epigraphs
Boyd and Vandenberghe Sections 2.1-2.3, 2.5, 3.1-3.2
- Lecture III: Monday October 6th
-
- Unconstrained optimization: descent methods; descent direction
- Gradient descent
- Line search: exact line search and backtracking line search
- Analysis of gradient descent with exact and backtracking line search
- Problems with badly conditioned functions; preconditioning
- Newton's method
Boyd and Vandenberghe Sections 9.1-9.3, 9.5
- Lecture IV: Thursday October 10th
-
- Analysis of Newton's method
- Self-concordance analysis of Newton's method
- Conjugate gradient descent
- Quasi-Newton methods: BFGS
- Summary of unconstrained optimization
Boyd and Vandenberghe Sections 9.5-9.6, Nocedal and Wright Sections 5.1, 6.1
- Lecture V: Monday October 14th
-
- Constrained Optimization: Formulation and Optimality Condition
- Linear Programming
- The Lagrangian, the dual problem, weak duality
- Slater's condition and strong duality
- Summary of unconstrained optimization
- Dual solutions as certificates for sub-optimality and infeasibility
Boyd and Vandenberghe Sections 4.2,4.3,5.1,5.2,5.4.1,5.5.1
- Lecture VI: Thursday October 17th
-
- Geometric interpretation of duality; Slater's condition
- Dual of a linear program
- Dual of a non-convex problem: max-cut clustering
- Dual of least-squares solution to under-determined linear system
- Least-squares solution: recovering the primal optimum from the dual optimum
- Complimentary slackness
Boyd and Vandenberghe Sections 5.3.1, 5.5.2
- Lecture VII: Monday October 21st
-
- Example: Max-flow/min-cut
- KKT conditions
- Examples: water-filing, large-margin linear discrimination.
- Expressing the dual in terms of Fenchel conjugates
Boyd and Vandenberghe Sections 5.5.3, 3.3, 5.1.6, 5.7.1
- Lecture VIII: Thursday October 24th
-
- Example: Logistic regression
- Example: Minimum volume covering ellipsoid and the log-det function
- Matrix inequalities, semi-definite constraints, and semi-definite programming
- Examples: covariance estimation, fastest mixing Markov chain
- Lagrangian duality for semi-definite constraints
- Complimentary slackness and KKT optimality conditions for semi-definite constraints
- Norm-constrained matrix factorization as an example of an SDP and its dual
Boyd and Vandenberghe Sections 4.6, 5.9
- Lecture IX: Monday October 28th
-
- Equality constrained Newton's method
- Optimization of problems with implicit constraints
- Log-barrier problems and the central path interior point method
- Analysis of the log-barrier central path method
- Log-barrier method for semi-definite constraints
Boyd and Vandenberghe Sections 10.1, 10.2, 11.1, 11.2, 11.3.1, 11.5, 11.6
- Lecture X: Thursday October 31st
-
- Feasibility problems and Phase I methods
- Reducing Optimization to feasibility
- Log-barrier and weakened KKT conditions
- Primal-Dual Interior Point Methods, including infeasible start
- Multi-objective problems and Pareto optimality
Boyd and Vandenberghe Sections 11.4.1, 11.4.3, 10.3.4, 11.7, 4.7
- Lecture XI: Monday November 4th
-
- Scalarization of multi-objective problems
- The Simplex method
Boyd and Vandenberghe Sections 11.4.1, Nocedal and Wright Chapter 13
- Lecture XII: Thursday November 7th
-
- The first order oracle model: subgradients and seperation oracles
- Classical first order methods: center-of-mass method, oracle lower bound, ellipsoid method
Chapters 1-3 of Nemirovksi's "Efficient Methods in Convex Programming"
- Lecture XIII: Monday November 11th
-
- Review and limitations of methods with polynomial convergence.
- Large-scale, slowly converging, first order methods.
- Sub-gradient descent with projection, step size and analysis for Lipschitz functions over a bounded domain.
Section 5.3.1 of Nemirovksi's "Lectures on Modern Convex Optimization"
- Lecture XIV: Thursday November 14th
-
- Gradient descent as a proximal point method.
- From gradient descent to bundle methods.
- Mirror Descent formulation and basic concepts: strong convexity with respect to a norm, dual norm, distance generating function and Bergman divergence
- Analysis Mirror Descent
Sections 5.3.2 and 5.4.2 of Nemirovksi's "Lectures on Modern Convex Optimization"
- Lecture XV: Monday November 18th
-
- Partial linarization in Mirror Descent
- More about sub-gradient descent: decreasing step-size
- Primal-dual analysis of Mirror Descent: Dual Averaging
- Choice of the distance generating function
- Lecture XVI: Thursday November 21st
-
- Stochastic Gradient Descent and Stochastic Optimization.
- Overview of further results for first order methods: faster rates with strong convexity, faster rates with smoothness, acceleration with smoothness.
Section 14.1 of Nemirovksi's "Efficient Methods in Convex Programming"
- Lecture XVII: Monday December 2nd
-
Assignments: