Epistemic Observability in Language Models
概要
arXiv:2603.20531v2 Announce Type: replace-cross Abstract: We find that models report highest confidence precisely when they are fabricating. Across four model families (OLMo-3, Llama-3.1, Qwen3, Mistral), self-reported confidence inversely correlates with accuracy, with AUC ranging from 0.28 to 0.3…