Adaptive Memory Decay for Log-Linear Attention
概要
arXiv:2605.06946v1 Announce Type: cross Abstract: Sequence models face a fundamental tradeoff between memory capacity and computational efficiency. Transformers achieve expressive context modeling at quadratic cost, while linear attention and state-space models run in linear time by compressing con…