SB-TRPO: Towards Safe Reinforcement Learning with Hard Constraints
概要
arXiv:2512.23770v3 Announce Type: replace-cross Abstract: In safety-critical domains, reinforcement learning (RL) agents must often satisfy strict, zero-cost safety constraints while accomplishing tasks. Existing model-free methods frequently either fail to achieve near-zero safety violations or be…