KV Cache Offloading for Context-Intensive Tasks
概要
arXiv:2604.08426v2 Announce Type: replace-cross Abstract: With the growing demand for long-context LLMs across a wide range of applications, the key-value (KV) cache has become a critical bottleneck for both latency and memory usage. Recently, KV-cache offloading has emerged as a promising approach…