Seminars & Colloquia Calendar
Recent Advances in Stochastic Gradient Methods: From Convex to Non-convex Optimization and Deep Learning
Mert Gurbuzbalaban (Rutgers)
Location: CoRE 301
Date & time: Wednesday, 06 November 2019 at 11:00AM - 12:00PM
Abstract: For many large-scale optimization and machine learning problems, first-order methods and their accelerated variants based on momentum have been a leading approach for computing low-to-medium accuracy solutions because of their cheap iterations and mild dependence on the problem dimension and data size. Even though momentum-based accelerated gradient (AG) methods proposed by Nesterov for convex optimization converges provably faster than gradient descent (GD) in the absence of noise, the comparison is no longer clear when the gradients are stochastic; containing random gradient errors. In the first part of the talk, we focus on stochastic gradient (SGD) and accelerated stochastic gradient (ASG) methods for convex optimization when the gradient has random errors.
We study the trade-offs between convergence rate and robustness to gradient errors in designing a first-order algorithm and provide a systematic way of trading off these two in an optimal fashion. Our results show that stochastic momentum methods can achieve acceleration while being more robust to random gradient errors. Our framework also leads to "optimal" algorithms that can perform better than other state-of-the-art methods in the presence of random gradient noise. We also discuss extensions of our results and algorithms to distributed convex optimization problems. In the second part of the talk, we focus on SGD for non-convex optimization and deep learning. The gradient noise (GN) in the SGD algorithm is often considered to be Gaussian in the large data regime by assuming that the classical central limit theorem (CLT) kicks in. This assumption is often made for mathematical convenience, since it enables SGD to be analyzed as a stochastic differential equation (SDE) driven by a Brownian motion. We argue that the Gaussianity assumption might fail to hold in deep learning settings and hence render the Brownian motion-based analyses inappropriate. Inspired by non-Gaussian natural phenomena, we consider the GN in a more general context and invoke the generalized CLT (GCLT), which suggests that the GN converges to a heavy-tailed ?-stable random variable.
Accordingly, we propose to analyze SGD as an SDE driven by a Lévy motion. Such SDEs can incur ‘jumps’, which force the SDE transition from narrow minima to wider minima, as proven by existing metastability theory. To validate the ?-stable assumption, we conduct extensive experiments on common deep learning architectures and show that in all settings, the GN is highly non-Gaussian and admits heavy-tails. We further investigate the tail behavior in varying network architectures and sizes, loss functions, and datasets. Our results open up a different perspective and shed more light on the belief that SGD prefers wide minima.
R. Shapiro Organizer's Page
Chiara Damiolini, Ian Coley and Franco Rota -Charles Weibel Organizer's Page
Narek Hovsepyan and Ewerton Rocha Vieira Organizer's page
Ziming Shi, Sagun Chanillo, Xiaojun Huang, Chi Li, Jian Song Seminar website Old seminar website
Sepehr Assadi Seminar webpage
Jeffry Kahn, Bhargav Narayanan, Jinyoung Park Organizer's webpage
Robert Dougherty-Bliss and Doron Zeilberger --> homepage
Paul Feehan, Daniel Ketover, Natasa Sesum Organizer's webpage
Lev Borisov, Emanuel Diaconescu, Angela Gibney, Nicolas Tarasca, and Chris Woodward Organizer's webpage
Hong Chen Seminar webpage
Fanxin Wu and Nkhalo Malawo Organizer's website
James Holland; Organizer website
Organizers: Maxime Van de Moortel and Avy Soffer. Organizer's Page
Yanyan Li, Zheng-Chao Han, Jian Song, Natasa Sesum Organizer's Webpage
Organizer: Luochen Zhao
Yanyan Li, Zheng-Chao Han, Natasa Sesum, Jian Song Organizer's Page
Lisa Carbone, Yi-Zhi Huang, James Lepowsky, Siddhartha Sahi Organizer's webpage
Simon Thomas website
Kasper Larsen, Daniel Ocone and Kim Weston Organizer's page
Joel Lebowitz, Michael Kiessling
Yanyan Li, Dennis Kriventsov Organizer's Webpage
Alex V. Kontorovich, Vlada Sedláček seminar website
Stephen D. Miller
Organizers: Yanyan Li, Z.C. Han, Jian Song, Natasa Sesum
Kristen Hendricks, Xiaochun Rong, Hongbin Sun, Chenxi Wu Organizer's page
Fioralba Cakoni Seminar webpage
Organizer's webpage: Organizer's webpage
For information on the Statistical Mechanics Conference, visit HERE
- Show events from all categories
Special Note to All Travelers
Directions: map and driving directions. If you need information on public transportation, you may want to check the New Jersey Transit page.
Unfortunately, cancellations do occur from time to time. Feel free to call our department: 848-445-6969 before embarking on your journey. Thank you.