Direction-Flipped Influence Audits Reveal Hidden Structure in Moral Choices of LLMs
概要
arXiv:2602.22831v2 Announce Type: replace-cross Abstract: Moral benchmarks for LLMs typically score models on context-free prompts, implicitly treating the measured choice rate as stable. We test this assumption with a direction-flipped influence audit: for each scenario, we compare a baseline prom…