POETS: Uncertainty-Aware LLM Optimization via Compute-Efficient Policy Ensembles
概要
arXiv:2605.07775v1 Announce Type: cross Abstract: Balancing exploration and exploitation is a core challenge in sequential decision-making and black-box optimization. We introduce POETS ($\textbf{Po}$licy $\textbf{E}$nsembles for $\textbf{T}$hompson $\textbf{S}$ampling), a novel framework that brid…