Can You Break RLVER? Probing Adversarial Robustness of RL-Trained Empathetic Agents
概要
arXiv:2605.07138v1 Announce Type: new Abstract: Reinforcement learning from verifiable emotion rewards RLVER has produced language models with strong empathetic performance, evaluated on benchmarks that assume cooperative, honest users. Yet real emotional interactions systematically violate this as…