arXiv cs.AI by Synapse Flow 編集部

SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection

概要

arXiv:2605.02888v2 Announce Type: replace-cross Abstract: Speculative decoding accelerates large language model (LLM) inference by using a small draft model to propose candidate tokens that a larger target model verifies. A critical hyperparameter in this process is the speculation length $\gamma$,…

元記事を読む →

関連記事