arXiv cs.AI by Synapse Flow 編集部

Beyond the Bellman Fixed Point: Geometry and Fast Policy Identification in Value Iteration

概要

arXiv:2604.17457v4 Announce Type: replace-cross Abstract: Q-value iteration (Q-VI) is usually analyzed through the \(\gamma\)-contraction of the Bellman operator. This argument proves convergence to \(Q^*\), but it gives only a coarse account of when the induced greedy policy becomes optimal. We st…

元記事を読む →

関連記事