SCOPE: Structured Decomposition and Conditional Skill Orchestration for Complex Image Generation
概要
arXiv:2605.08043v1 Announce Type: cross Abstract: While text-to-image models have made strong progress in visual fidelity, faithfully realizing complex visual intents remains challenging because many requirements must be tracked across grounding, generation, and verification. We refer to these requ…