The Effect of Mini-Batch Noise on the Implicit Bias of Adam
概要
arXiv:2602.01642v2 Announce Type: replace-cross Abstract: With limited high-quality data and growing compute, multi-epoch training is gaining back its importance across sub-areas of deep learning. Adam(W), versions of which are go-to optimizers for many tasks such as next token prediction, has two …