When Quantization Is Free: An int4 KV Cache That Outruns fp16 on Apple Silicon
概要
arXiv:2605.05699v1 Announce Type: cross Abstract: KV-cache quantization is framed as a quality--latency trade-off. We show it is \emph{inverted} on Apple Silicon's unified memory: a single fused Metal kernel (sign-randomized FFT $+$ per-channel $\lambda$ $+$ per-group abs-max $+$ int4 nibble pack),…