Probe-Geometry Alignment: Erasing the Cross-Sequence Memorization Signature Below Chance
概要
arXiv:2605.01699v3 Announce Type: replace-cross Abstract: Recent attacks show that behavioural unlearning of large language models leaves internal traces recoverable by adversarial probes. We characterise where this retention lives and show it can be surgically removed without measurable capability…