Structural Sensitivity in Compressed Transformers: Relative Error Propagation and Layer Removal
概要
arXiv:2603.20991v2 Announce Type: replace-cross Abstract: Compressing transformer weights makes large language models cheaper to deploy. But each layer's compression introduces an error. These errors accumulate as the signal passes through later layers, and how they accumulate is not well understoo…