Beyond Confidence: Rethinking Self-Assessments for Performance Prediction in LLMs
概要
arXiv:2605.07806v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly used in settings where reliable self-assessment is critical. Assessing model reliability has evolved from using probabilistic correctness estimates to, more recently, eliciting verbalized confidence. Con…