CoCoReviewBench: A Completeness- and Correctness-Oriented Benchmark for AI Reviewers
概要
arXiv:2605.07905v1 Announce Type: cross Abstract: Despite the rapid development of AI reviewers, evaluating such systems remains challenging: metrics favor overlap with human reviews over correctness. However, since human reviews often cover only a subset of salient issues and sometimes contain mis…