Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs
概要
arXiv:2605.00814v2 Announce Type: replace-cross Abstract: While autoregressive Large Vision-Language Models (LVLMs) demonstrate remarkable proficiency in multimodal tasks, they face a "Visual Signal Dilution" phenomenon, where the accumulation of textual history expands the attention partition func…