Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models
概要
arXiv:2605.07721v1 Announce Type: cross Abstract: Recurrent LLM architectures have emerged as a promising approach for improving reasoning, as they enable multi-step computation in the embedding space without generating intermediate tokens. Models such as Ouro perform reasoning by iteratively updat…