Offline Policy Optimization with Posterior Sampling
概要
arXiv:2605.07393v1 Announce Type: new Abstract: A fundamental challenge in model-based offline reinforcement learning (RL) lies in the trade-off between generalization and robustness against exploitation errors in out-of-distribution (OOD) regions. While OOD samples may capture valid underlying phy…