Causal Probing for Internal Visual Representations in Multimodal Large Language Models
概要
arXiv:2605.05593v1 Announce Type: new Abstract: Despite the remarkable success of Multimodal Large Language Models (MLLMs) across diverse tasks, the internal mechanisms governing how they encode and ground distinct visual concepts remain poorly understood. To bridge this gap, we propose a causal fr…