Rubric-based On-policy Distillation
概要
arXiv:2605.07396v1 Announce Type: cross Abstract: On-policy distillation (OPD) is a powerful paradigm for model alignment, yet its reliance on teacher logits restricts its application to white-box scenarios. We contend that structured semantic rubrics can serve as a scalable alternative to teacher …