Skip to content. | Skip to navigation

Personal tools

Theme for TIFR Centre For Applicable Mathematics, Bangalore

Navigation

You are here: Home / Events / Various Facets of SGD: Minibatching, Acceleration and Dependent Data

Various Facets of SGD: Minibatching, Acceleration and Dependent Data

Praneeth Netrapalli
Speaker
Praneeth Netrapalli
When Nov 02, 2021
from 04:00 PM to 05:00 PM
Where zoom meet
Add event to calendar vCal
iCal

AbstractStochastic gradient descent (SGD) is the workhorse of modern machine learning. However, there are several "tweaks" that are important for its good practical performance such as minibatching, momentum etc. Despite the practical importance of these "tweaks", only recently have these been mathematically thoroughly analyzed even for the case of stochastic linear regression. In this talk, we will first present these recent results, focusing particularly on the aspect of momentum or acceleration.

Next, we will consider SGD for dependent data, where the data is not independent but rather comes from a Markov chain. This arises in several important applications such as time series analysis, reinforcement learning etc. For this setting, we will present information theoretic lower bounds showing that the performance of any algorithm degrades with dependent data compared to the independent data setting. In the important special case of realizable least squares regression, we show that SGD along with a technique known as "reverse experience replay" can bridge the gap between dependent and independent data settings.

The talk will assume basic familiarity with probability and linear algebra such as spectral theorem for symmetric matrices but will otherwise be accessible.

Based on joint works with Naman Agarwal, Guy Bresler, Syomantak Chaudhuri, Prateek Jain, Sham M. Kakade, Rahul Kidambi, Suhas Kowshik, Dheeraj Nagaraj, Aaron Sidford and Carrie Wu.


Youtube link to the recording

Filed under: