Normalized Architectures are Natively 4-Bit
概要
arXiv:2605.06067v1 Announce Type: cross Abstract: Training large language models at 4-bit precision is critical for efficiency. We show that nGPT, an architecture that constrains weights and hidden representations to the unit hypersphere, is inherently more robust to low-precision arithmetic. This …