arXiv cs.AI 2026年5月7日 13:00 by Synapse Flow 編集部

Descent-Guided Policy Gradient for Scalable Cooperative Multi-Agent Learning

概要

arXiv:2602.20078v3 Announce Type: replace-cross Abstract: Scaling cooperative multi-agent reinforcement learning (MARL) is fundamentally limited by cross-agent noise. When agents share a common reward, each agent's learning signal is computed from a shared return that depends on all agents, so the …

元記事を読む →