GazeVLM: Active Vision via Internal Attention Control for Multimodal Reasoning
概要
arXiv:2605.07817v1 Announce Type: cross Abstract: Human visual reasoning is governed by active vision, a process where metacognitive control drives top-down goal-directed attention, dynamically routing foveal focus toward task-relevant details while maintaining peripheral awareness of the global sc…