Benchmarking World-Model Learning with Environment-Level Queries
概要
arXiv:2510.19788v4 Announce Type: replace Abstract: World models are central to building AI agents capable of flexible reasoning and planning. Yet current evaluations (i) test only properties measurable from observed interactions, such as next-frame prediction or task return, and (ii) do not test w…