Various Facets of SGD: Minibatching, Acceleration and Dependent Data
Speaker |
Praneeth Netrapalli
|
---|---|
When |
Nov 02, 2021
from 04:00 PM to 05:00 PM |
Where | zoom meet |
Add event to calendar |
vCal iCal |
Abstract: Stochastic gradient descent (SGD) is the workhorse of modern machine learning. However, there are several "tweaks" that are important for its good practical performance such as minibatching, momentum etc. Despite the practical importance of these "tweaks", only recently have these been mathematically thoroughly analyzed even for the case of stochastic linear regression. In this talk, we will first present these recent results, focusing particularly on the aspect of momentum or acceleration.
Next, we will consider SGD for dependent data, where the data is not independent but rather comes from a Markov chain. This arises in several important applications such as time series analysis, reinforcement learning etc. For this setting, we will present information theoretic lower bounds showing that the performance of any algorithm degrades with dependent data compared to the independent data setting. In the important special case of realizable least squares regression, we show that SGD along with a technique known as "reverse experience replay" can bridge the gap between dependent and independent data settings.
The talk will assume basic familiarity with probability and linear algebra such as spectral theorem for symmetric matrices but will otherwise be accessible.
Based on joint works with Naman Agarwal, Guy Bresler, Syomantak Chaudhuri, Prateek Jain, Sham M. Kakade, Rahul Kidambi, Suhas Kowshik, Dheeraj Nagaraj, Aaron Sidford and Carrie Wu.