arXiv cs.AI by Synapse Flow 編集部

Jailbroken Frontier Models Retain Their Capabilities

概要

arXiv:2605.00267v2 Announce Type: replace-cross Abstract: As language model safeguards become more robust, attackers are pushed toward developing increasingly complex jailbreaks. Prior work has found that this complexity imposes a "jailbreak tax" that degrades the target model's task performance. W…

元記事を読む →

関連記事