Safe, or Simply Incapable? Rethinking Safety Evaluation for Phone-Use Agents
概要
arXiv:2605.07630v1 Announce Type: cross Abstract: When a phone-use agent avoids harm, does that show safety, or simply inability to act? Existing evaluations often cannot tell. A harmful outcome may be avoided because the agent recognized the risk and chose the safe action, or because it failed to …