Conversation for Non-verifiable Learning: Self-Evolving LLMs through Meta-Evaluation
概要
arXiv:2601.21464v2 Announce Type: replace-cross Abstract: Training large language models (LLMs) for non-verifiable tasks, such as creative writing, dialogue, and ethical reasoning, remains challenging due to the absence of ground-truth labels. While LLM-as-Judge approaches offer a scalable alternat…