R-GTD: A Geometric Analysis of Gradient Temporal-Difference Learning in Singular Regimes
概要
arXiv:2601.20599v2 Announce Type: replace-cross Abstract: Gradient temporal-difference (GTD) learning algorithms are widely used for off-policy policy evaluation with function approximation. However, existing convergence analyses rely on the restrictive assumption that the so-called feature interac…