arXiv cs.AI by Synapse Flow 編集部

Reasoning-Guided Grounding: Elevating Video Anomaly Detection through Multimodal Large Language Models

概要

arXiv:2605.02912v1 Announce Type: cross Abstract: Video Anomaly Detection (VAD) has traditionally been framed as binary classification or outlier detection, providing neither interpretable reasoning nor precise spatial localization of anomalous events. While Vision-Language Models (VLMs) offer rich…

元記事を読む →

関連記事