Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key
概要
arXiv:2605.06638v1 Announce Type: new Abstract: Reinforcement learning (RL) has been applied to improve large language model (LLM) reasoning, yet the systematic study of how training scales with task difficulty has been hampered by the lack of controlled, scalable environments. We introduce ScaleLo…