arXiv cs.AI by Synapse Flow 編集部

Gated Subspace Inference for Transformer Acceleration

概要

arXiv:2605.03109v1 Announce Type: cross Abstract: A method is presented for accelerating inference in transformer language models by exploiting the low effective rank of the token activation manifold at each layer. The method decomposes each activation vector into a subspace component and a residua…

元記事を読む →

関連記事