Towards Apples to Apples for AI Evaluations: From Real-World Use Cases to Evaluation Scenarios
概要
arXiv:2605.07986v1 Announce Type: cross Abstract: AI measurement science has a wide variety of methodologies and measurements for comparing AI systems, resulting in what often appear to be "apples-to-oranges" comparisons across AI evaluations. To move toward "apples-to-apples" comparisons in real-w…