arXiv cs.AI by Synapse Flow 編集部

Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key

概要

arXiv:2605.06638v1 Announce Type: new Abstract: Reinforcement learning (RL) has been applied to improve large language model (LLM) reasoning, yet the systematic study of how training scales with task difficulty has been hampered by the lack of controlled, scalable environments. We introduce ScaleLo…

元記事を読む →

関連記事