Skip to content. | Skip to navigation

Personal tools

Theme for TIFR Centre For Applicable Mathematics, Bangalore

Navigation

You are here: Home / Events / Concentration Bounds for Stochastic Approximation with Applications to Reinforcement Learning

Concentration Bounds for Stochastic Approximation with Applications to Reinforcement Learning

Dr. Gugan Thoppe Duke University, Durham, USA
Speaker
Dr. Gugan Thoppe Duke University, Durham, USA
When Nov 16, 2018
from 03:00 PM to 04:00 PM
Where LH 006
Add event to calendar vCal
iCal

Abstract: Stochastic Approximation (SA) refers to iterative algorithms that can be used to find optimal points or zeros of a function, given only its noisy estimates. In this talk, I will review our recent advances in techniques for analysing SA methods. This talk has four major parts. In the first part, we will see a motivating application of SA to network tomography and, alongside, discuss the convergence of a novel stochastic Kaczmarz method. In the second part, we shall see a novel analysis approach for non-linear SA methods in the neighbourhood of an isolated solution. The main tools here include the Alekseev formula, which helps exactly compare the solutions of a non-linear ODE to that of its perturbation, and a novel concentration inequality for a sum of martingale differences. In the third part, we will extend the previous tool to the two timescale but linear SA setting. Here, I will also present our ongoing work to obtain tight convergence rates in this setup. In parallel, we will also see how these results can be applied to gradient Temporal Difference (TD) methods such as GTD(0), GTD2, and TDC that are used in reinforcement learning. For the analyses in the second and third parts to hold, the initial step size must be chosen sufficiently small, depending on unknown problem-dependent parameters; or, alternatively, one must use projections. In the fourth part, we shall discuss a trick to obviate this in context of the one timescale, linear TD(0) method. We strongly believe that this trick is generalizable. We also provide here a novel expectation bound. We shall end with some future directions.

Filed under: