Ferenc Huszár • 4/1/2021

Notes on the Origin of Implicit Regularization in SGD

This article analyzes the concept of implicit regularization in Stochastic Gradient Descent (SGD), explaining how the optimization algorithm itself biases learning toward minima that generalize well. It discusses recent research moving beyond the neural tangent kernel framework to study SGD with finite learning rates and minibatches, providing a more practical view of why deep neural networks generalize effectively.

0 comments

#Deep Learning #Generalization #Stochastic Gradient Descent