Emergent Slow Thinking in LLMs as Inverse Tree Freezing
概要
arXiv:2509.23629v3 Announce Type: replace Abstract: Reinforcement learning with verifiable rewards (RLVR) enables large language models to acquire slow, multi-step reasoning from sparse final-answer signals. We provide a statistical-physics picture of this emergence. We show that an autoregressive …