Goldilocks RL: Tuning Task Difficulty to Escape Sparse Rewards for Reasoning
概要
arXiv:2602.14868v2 Announce Type: replace-cross Abstract: Reinforcement learning has emerged as a powerful paradigm for unlocking reasoning capabilities in language models. However, relying on sparse rewards makes this process highly sample-inefficient, as models must navigate vast search spaces wi…