Toeplitz MLP Mixers are Low Complexity, Information-Rich Sequence Models
概要
arXiv:2605.06683v1 Announce Type: cross Abstract: Transformer-based large language models are in some respects limited by the quadratic time and space computational complexity of attention. We introduce the Toeplitz MLP Mixer (TMM), a transformer-like architecture that swaps attention for triangula…