GR-Ben: A General Reasoning Benchmark for Evaluating Process Reward Models
概要
arXiv:2605.01203v2 Announce Type: replace Abstract: Currently, process reward models (PRMs) have exhibited remarkable potential for test-time scaling. Since large language models (LLMs) regularly generate flawed intermediate reasoning steps when tackling a broad spectrum of reasoning and decision-m…