Feature Starvation as Geometric Instability in Sparse Autoencoders
概要
arXiv:2605.05341v1 Announce Type: cross Abstract: Sparse autoencoders (SAEs) are used to disentangle the dense, polysemantic internal representations of large language models (LLMs) into interpretable, monosemantic concepts. However, standard $\ell_1$-regularized SAEs suffer from feature starvation…