Implicit Compression Regularization: Concise Reasoning via Internal Shorter Distributions in RL Post-Training
概要
arXiv:2605.07316v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards improves LLM reasoning but often induces overthinking, where models generate unnecessarily long reasoning traces. Existing methods mainly rely on length penalties or early-exit strategies; however, the fo…