Render, Don't Decode: Weight-Space World Models with Latent Structural Disentanglement
概要
arXiv:2605.06298v1 Announce Type: cross Abstract: Training world models on vast quantities of unlabelled videos is a critical step toward fully autonomous intelligence. However, the prevailing paradigm of encoding raw pixels into opaque latent spaces and relying on heavy decoders for reconstruction…