Low-Rank Adaptation for Critic Learning in Off-Policy Reinforcement Learning
概要
arXiv:2604.18978v2 Announce Type: replace-cross Abstract: Scaling critic capacity is a promising direction for improving off-policy reinforcement learning (RL). However, recent work shows that larger critics are prone to overfitting and instability in replay-based bootstrapped training. In this pap…