arXiv cs.AI by Synapse Flow 編集部

GR-Ben: A General Reasoning Benchmark for Evaluating Process Reward Models

概要

arXiv:2605.01203v2 Announce Type: replace Abstract: Currently, process reward models (PRMs) have exhibited remarkable potential for test-time scaling. Since large language models (LLMs) regularly generate flawed intermediate reasoning steps when tackling a broad spectrum of reasoning and decision-m…

元記事を読む →

関連記事