arXiv cs.AI by Synapse Flow 編集部

Correct Is Not Enough: Training Reasoning Planners with Executor-Grounded Rewards

概要

arXiv:2605.03862v2 Announce Type: new Abstract: Reinforcement learning with verifiable rewards has become a common way to improve explicit reasoning in large language models, but final-answer correctness alone does not reveal whether the reasoning trace is faithful, reliable, or useful to the model…

元記事を読む →

関連記事