arXiv cs.AI by Synapse Flow 編集部

ANCORA: Learning to Question via Manifold-Anchored Self-Play for Verifiable Reasoning

概要

arXiv:2604.27644v2 Announce Type: replace-cross Abstract: We propose a paradigm shift toward open-ended curriculum self-play: rather than learning to answer on a fixed prompt set, a unified policy learns to question: generating verifiable problems, solving them, and turning verifier feedback into s…

元記事を読む →

関連記事