Cascade Token Selection for Transformer Attention Acceleration
概要
arXiv:2605.03110v1 Announce Type: cross Abstract: A method is presented for reducing the cost of representative token selection in transformer attention layers by exploiting the coherence of the representative set across depth. Activation Decorrelation Attention (ADA) selects $r \ll T$ representati…