Abstract
In 1961, Minsky perceived a fundamental flaw within the burgeoning field of artificial neural networks. He doubted that such a nonlinear system could be effectively trained using gradient methods, because unless the “structure of the search space is special, the optimization may do more harm than good.” Fast forward to today, and we observe deep neural networks — far more complex than those envisioned at the field's inception — being successfully trained with methods akin to gradient descent. It has, indeed, become evident that the objective function displays a highly benign structure that we are only starting to comprehend. In this lecture, I aim to summarize our current understanding of this enigmatic optimization process. I will explore a diverse array of themes, including intrinsic dimensionality, the optimization landscape, and implicit regularization, and I will highlight key open questions, all within the context of residual networks and generative models.
Director of Center for Optimization and Statistical Learning
Walter P. Murphy Professor of Industrial Engineering and Management Sciences
Department of Engineering Sciences and Applied Mathematics
Recipient of the SIAM 2024 John von Neumann Prize