arXiv cs.AI by Synapse Flow 編集部

Attributions All the Way Down? The Metagame of Interpretability

概要

arXiv:2605.06295v1 Announce Type: cross Abstract: We introduce the metagame, a conceptual framework for quantifying second-order interaction effects of model explanations. For any first-order attribution $\phi(f)$ explaining a model $f$, we measure the directional influence of feature $j$ on the at…

元記事を読む →

関連記事