What You Think is What You See: Driving Exploration in VLM Agents via Visual-Linguistic Curiosity
概要
arXiv:2605.03782v1 Announce Type: new Abstract: To navigate partially observable visual environments, recent VLM agents increasingly internalize world modeling capabilities into their policies via explicit CoT reasoning, enabling them to mentally simulate futures before acting. However, relying sol…