Same Words, Different Judgments: How Preferences Vary Across Modalities
概要
arXiv:2602.22710v2 Announce Type: replace-cross Abstract: Preference-based reinforcement learning (PbRL) is the dominant framework for aligning AI systems to human preferences. However, evaluation protocols for such data were designed for text and have not been validated for speech. We present the …