arXiv cs.AI by Synapse Flow 編集部

Theoretically Optimal Attention/FFN Ratios in Disaggregated LLM Serving

概要

arXiv:2601.21351v2 Announce Type: replace-cross Abstract: Attentio-FFN disaggregation (AFD) is an emerging architecture for LLM decoding that separates state-heavy, KV-cache-dominated Attention computation from stateless, compute-intensive FFN computation, connected by per-step communication. While…

元記事を読む →

関連記事