arXiv cs.AI by Synapse Flow 編集部

Measuring Evaluation-Context Divergence in Open-Weight LLMs: A Paired-Prompt Protocol with Pilot Evidence of Alignment-Pipeline-Specific Heterogeneity

概要

arXiv:2605.06327v1 Announce Type: cross Abstract: Safety benchmarks are routinely treated as evidence about how a language model will behave once deployed, but this inference is fragile if behavior depends on whether a prompt looks like an evaluation. We define evaluation-context divergence as an o…

元記事を読む →

関連記事