Steering Visual Generation in Unified Multimodal Models with Understanding Supervision
概要
arXiv:2605.05781v1 Announce Type: cross Abstract: Unified multimodal models are envisioned to bridge the gap between understanding and generation. Yet, to achieve competitive performance, state-of-the-art models adopt largely decoupled understanding and generation components. This design, while eff…