Theoretically Optimal Attention/FFN Ratios in Disaggregated LLM Serving
概要
arXiv:2601.21351v2 Announce Type: replace-cross Abstract: Attentio-FFN disaggregation (AFD) is an emerging architecture for LLM decoding that separates state-heavy, KV-cache-dominated Attention computation from stateless, compute-intensive FFN computation, connected by per-step communication. While…