arXiv cs.AI by Synapse Flow 編集部

What You Think is What You See: Driving Exploration in VLM Agents via Visual-Linguistic Curiosity

概要

arXiv:2605.03782v1 Announce Type: new Abstract: To navigate partially observable visual environments, recent VLM agents increasingly internalize world modeling capabilities into their policies via explicit CoT reasoning, enabling them to mentally simulate futures before acting. However, relying sol…

元記事を読む →

関連記事