Do Joint Audio-Video Generation Models Understand Physics?
概要
arXiv:2605.07061v1 Announce Type: cross Abstract: Joint audio-video generation models are rapidly approaching professional production quality, raising a central question: do they understand audio-visual physics, or merely generate plausible sounds and frames that violate real-world consistency? We …