Frictional Q-Learning
概要
arXiv:2509.19771v4 Announce Type: replace-cross Abstract: Off-policy reinforcement learning suffers from extrapolation errors when a learned policy selects actions that are weakly supported in the replay buffer. In this study, we address this issue by drawing an analogy to static friction. From thi…