Reasoning-Guided Grounding: Elevating Video Anomaly Detection through Multimodal Large Language Models
概要
arXiv:2605.02912v1 Announce Type: cross Abstract: Video Anomaly Detection (VAD) has traditionally been framed as binary classification or outlier detection, providing neither interpretable reasoning nor precise spatial localization of anomalous events. While Vision-Language Models (VLMs) offer rich…