ForgeVLA: Federated Vision-Language-Action Learning without Language Annotations
概要
arXiv:2605.07474v1 Announce Type: cross Abstract: Vision-Language-Action (VLA) models hold great promise for general-purpose robotic intelligence, yet scaling up such models is severely bottlenecked by the high cost of acquiring annotated training data. Fortunately, vision-equipped robots deployed …