arXiv cs.AI by Synapse Flow 編集部

Pro-KLShampoo: Projected KL-Shampoo with Whitening Recovered by Orthogonalization

概要

arXiv:2605.06316v1 Announce Type: cross Abstract: Optimizers that exploit the matrix structure of gradients are central to modern LLM pre-training, with two distinct frontiers: explicit Kronecker-factored preconditioning -- most recently KL-Shampoo, which estimates the preconditioner via KL diverge…

元記事を読む →

関連記事