Think Before You Drive: World Model-Inspired Multimodal Grounding for Autonomous Vehicles
概要
arXiv:2512.03454v4 Announce Type: replace-cross Abstract: Interpreting natural-language commands to localize target objects is critical for autonomous driving (AD). Existing visual grounding (VG) methods for autonomous vehicles (AVs) typically struggle with ambiguous, context-dependent instructions…