Karolina Dziugaite (Element AI), gives the OxCSML Seminar on 26th February 2021.
Abstract: Deep learning approaches dominate in many application areas. Our understanding of generalization (relating empirical performance to future expected performance) is however lacking. In some applications, standard algorithms like stochastic gradient descent (SGD) reliably return solutions with low test error. In other applications, these same algorithms rapidly overfit. There is, as yet, no satisfying theory explaining what conditions are required for these common algorithms to work in practice. In this talk, I will discuss standard approaches to explaining generalization in deep learning using tools from statistical learning theory, and present some of the barriers these approaches face to explaining deep learning. I will then discuss my recent work (NeurIPS 2019, 2020) on information-theoretic approaches to understanding generalization of noisy, iterative learning algorithms, such as Stochastic Gradient Langevin Dynamics, a noisy version of SGD.