Pro-KLShampoo: Projected KL-Shampoo with Whitening Recovered by Orthogonalization
概要
arXiv:2605.06316v1 Announce Type: cross Abstract: Optimizers that exploit the matrix structure of gradients are central to modern LLM pre-training, with two distinct frontiers: explicit Kronecker-factored preconditioning -- most recently KL-Shampoo, which estimates the preconditioner via KL diverge…