LensVLM: Selective Context Expansion for Compressed Visual Representation of Text
概要
arXiv:2605.07019v1 Announce Type: cross Abstract: Vision Language Models (VLMs) offer the exciting possibility of processing text as rendered images, bypassing the need for tokenizing the text into long token sequences. Since VLM image encoders map fixed-size images to a fixed number of visual toke…