MISA: Mixture of Indexer Sparse Attention for Long-Context LLM Inference
概要
arXiv:2605.07363v1 Announce Type: cross Abstract: DeepSeek Sparse Attention (DSA) sets the state of the art for fine-grained inference-time sparse attention by introducing a learned token-wise indexer that scores every prefix token and selects the most relevant ones for the main attention. To remai…