Structural Instability of Feature Composition
概要
arXiv:2605.05223v1 Announce Type: cross Abstract: Sparse Autoencoders (SAEs) have emerged as a powerful paradigm for disentangling feature superposition in transformer-based architectures, enabling precise control via activation steering. However, the theoretical foundations of compositional steeri…