Architectural Observability Collapse in Transformers
概要
arXiv:2604.24801v2 Announce Type: replace-cross Abstract: Activation monitoring can catch confident errors in autoregressive transformers only if training preserved an internal decision-quality signal that output confidence does not expose. Monitorability is an architectural property before it is a…