Mechanistic Interpretability with Sparse Autoencoder Neural Operators
概要
arXiv:2509.03738v4 Announce Type: replace-cross Abstract: We introduce sparse autoencoder neural operators (SAE-NOs), a new class of sparse autoencoders that operate in function spaces rather than fixed-dimensional Euclidean representations. We formalize the functional representation hypothesis, wh…