A Systematic Investigation of The RL-Jailbreaker in LLMs
概要
arXiv:2605.07032v1 Announce Type: cross Abstract: The evolution of generative models from next-token predictors to autonomous engines of complex systems necessitates rigorous safety hardening. Adversarial jailbreaking, the strategic manipulation of models to elicit harmful output, remains a primary…