Irminsul: MLA-Native Position-Independent Caching for Agentic LLM Serving
概要
arXiv:2605.05696v1 Announce Type: cross Abstract: Agentic LLM workloads put bit-identical tokens at shifted positions every turn, voiding prefix caches at the first byte of divergence. Operators report cache-hit regressions ranging from moderate slowdowns to severe TTFT spikes of 10-16s on unchange…