Semantic Integrity Matters: Benchmarking and Preserving High-Density Reasoning in KV Cache Compression
概要
arXiv:2502.01941v3 Announce Type: replace-cross Abstract: While Key-Value (KV) cache compression is essential for efficient LLM inference, current evaluations disproportionately focus on sparse retrieval tasks, potentially masking the degradation of High-Density Reasoning where Chain-of-Thought (Co…