MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge
概要
arXiv:2507.21183v5 Announce Type: replace-cross Abstract: As the era of large language models (LLMs) unfolds, Preference Optimization (PO) methods have become a central approach to aligning LLMs with human preferences and improving performance. We propose Maximum a Posteriori Preference Optimizatio…