April 23, 2021
Adrian Lewis (Cornell University)
April 23, 2021 from 15:00 to 16:00 (Montreal/EST time)
To the dismay and irritation of the variational analysis community, practitioners of deep learning often implement gradient-based optimization via automatic differentiation and blithely apply the result to nonsmooth objectives. Worse, they then gleefully point out numerical convergence. In fact, as elegantly remarked by Bolte and Pauwels, automatic differentiation produces a novel generalized gradient: a conservative field with enough calculus to prove convergence of stochastic subgradient descent, as practiced in deep learning. I will sketch this interplay of analytic and algorithmic ideas, and explain how, for concrete objectives (typically semi-algebraic), this novel generalized gradient just slightly modifies Clarke's original notion.
Joint work with Tonghua Tian.