Alternating Reinforcement Learning with Contextual Rubric Rewards: Beyond the Scalarization Strategy
概要
arXiv:2603.15646v2 Announce Type: replace-cross Abstract: Reinforcement Learning with Rubric Rewards (RLRR) is a framework that extends conventional reinforcement learning from human feedback (RLHF) and verifiable rewards (RLVR) by replacing scalar preference signals with structured, multi-dimensio…