arXiv cs.AI by Synapse Flow 編集部

QKVShare: Quantized KV-Cache Handoff for Multi-Agent On-Device LLMs

概要

arXiv:2605.03884v1 Announce Type: new Abstract: Multi-agent LLM systems on edge devices need to hand off latent context efficiently, but the practical choices today are expensive re-prefill or full-precision KV transfer. We study QKVShare, a framework for quantized KV-cache handoff between agents t…

元記事を読む →

関連記事