ANCORA: Learning to Question via Manifold-Anchored Self-Play for Verifiable Reasoning
概要
arXiv:2604.27644v2 Announce Type: replace-cross Abstract: We propose a paradigm shift toward open-ended curriculum self-play: rather than learning to answer on a fixed prompt set, a unified policy learns to question: generating verifiable problems, solving them, and turning verifier feedback into s…