Muon Dynamics as a Spectral Wasserstein Flow
概要
arXiv:2604.04891v2 Announce Type: replace-cross Abstract: Gradient normalization stabilizes deep-learning optimization, and spectral normalizations are especially natural for matrix-shaped parameter blocks; Muon is the motivating example. We study an idealized deterministic, continuous-time, vanish…