Schedule-and-Calibrate: Utility-Guided Multi-Task Reinforcement Learning for Code LLMs
概要
arXiv:2605.06111v1 Announce Type: cross Abstract: Reinforcement learning (RL) with verifiable rewards has proven effective at post-training LLMs for coding, yet deploying separate task-specific specialists incurs costs that scale with the number of tasks, motivating a unified multi-task RL (MTRL) a…