arXiv cs.AI by Synapse Flow 編集部

HeadQ: Model-Visible Distortion and Score-Space Correction for KV-Cache Quantization

概要

arXiv:2605.03562v1 Announce Type: cross Abstract: KV-cache quantizers usually optimize storage-space reconstruction, even though attention reads keys through logits and values through attention-weighted readout. We argue that persistent cache error should be measured in model-visible coordinates. F…

元記事を読む →

関連記事