Beyond State-Wise Mirror Descent: Offline Policy Optimization with Parametric Policies
概要
arXiv:2602.23811v4 Announce Type: replace-cross Abstract: We investigate the theoretical aspects of offline reinforcement learning (RL) under general function approximation. While prior works (e.g., Xie et al., 2021) have established the theoretical foundations of learning a good policy from offlin…