arXiv cs.AI by Synapse Flow 編集部

The Structural Origin of Attention Sink: Variance Discrepancy, Super Neurons, and Dimension Disparity

概要

arXiv:2605.06611v1 Announce Type: cross Abstract: Despite the prevalence of the attention sink phenomenon in Large Language Models (LLMs), where initial tokens disproportionately monopolize attention scores, its structural origins remain elusive. This work provides a \textit{mechanistic explanation…

元記事を読む →

関連記事