When Stored Evidence Stops Being Usable: Scale-Conditioned Evaluation of Agent Memory
概要
arXiv:2605.07313v1 Announce Type: new Abstract: Memory-agent evaluations report fixed-snapshot accuracy or retrieval quality, but these scores do not show whether evidence remains usable as irrelevant sessions (sessions not annotated as task-relevant evidence for the query) accumulate. We present a…