arXiv cs.AI by Synapse Flow 編集部

Continuum: Efficient and Robust Multi-Turn LLM Agent Scheduling with KV Cache Time-to-Live

概要

arXiv:2511.02230v4 Announce Type: replace-cross Abstract: KV cache management is essential for efficient LLM inference. To maximize utilization, existing inference engines evict finished requests' KV cache if new requests are waiting. This policy breaks for agentic workloads, which interleave LLM c…

元記事を読む →

関連記事