arXiv cs.AI by Synapse Flow 編集部

Adaptive Greedy Frame Selection for Long Video Understanding

概要

arXiv:2603.20180v2 Announce Type: replace-cross Abstract: Large vision--language models (VLMs) are increasingly applied to long-video question answering, yet inference is often bottlenecked by the number of input frames and resulting visual tokens. Naive sparse sampling can miss decisive moments, w…

元記事を読む →

関連記事