Knowledge-Level Consistency Reinforcement Learning: Dual-Fact Alignment for Long-Form Factuality
概要
arXiv:2509.23765v3 Announce Type: replace-cross Abstract: Hallucination in large language models (LLMs) during long-form generation remains difficult to address under existing reinforcement learning from human feedback (RLHF) frameworks, as their preference rewards often overlook the model's own kn…