Ferenc Huszár • 4/23/2021

On Information Theoretic Bounds for SGD

This technical blog post discusses a theoretical approach to understanding the generalization of Stochastic Gradient Descent (SGD) using information theory. It explains a thought experiment linking the mutual information between model parameters and the training dataset to generalization performance, and outlines how KL divergences are used to derive formal bounds for SGD.

0 comments

#Machine Learning #Information Theory #Kl Divergence