InvThink: Premortem Reasoning for Safer Language Models
概要
arXiv:2510.01569v3 Announce Type: replace Abstract: We present InvThink, a training and prompting framework that requires the model to enumerate, analyze, and constrain potential failures before generating its final response. Unlike existing safety alignment methods that optimize only for safe fina…