EnvSimBench: A Benchmark for Evaluating and Improving LLM-Based Environment Simulation
概要
arXiv:2605.07247v1 Announce Type: new Abstract: Scalable AI agents training relies on interactive environments that faithfully simulate the consequences of agent actions. Manually crafted environments are expensive to build, brittle to extend, and fundamentally limited in diversity. A promising dir…