arXiv cs.AI by Synapse Flow 編集部

Learning Reasoning Rewards from Expert Demonstrations with Inverse Reinforcement Learning

概要

arXiv:2510.01857v4 Announce Type: replace Abstract: Teaching large language models (LLMs) to reason during post-training typically relies on reinforcement learning with explicit outcome- or process-based reward functions. However, in many real-world settings, obtaining or defining such reward funct…

元記事を読む →

関連記事