arXiv cs.AI by Synapse Flow 編集部

HWE-Bench: Benchmarking LLM Agents on Real-World Hardware Bug Repair Tasks

概要

arXiv:2604.14709v3 Announce Type: replace Abstract: Existing benchmarks for hardware design primarily evaluate Large Language Models (LLMs) on isolated, component-level tasks such as generating HDL modules from specifications, leaving repository-scale evaluation unaddressed. We introduce HWE-Bench,…

元記事を読む →

関連記事