Catch Your Breath: Adaptive Computation for Self-Paced Sequence Production
概要
arXiv:2510.13879v2 Announce Type: replace-cross Abstract: Within the landscape of inference-time scaling methods for foundation models, a width-based approach to scaling -- which involves the insertion of tokens in the input stream to delay model responses -- offers a unique advantage by increasing…