Policy-Guided Stepwise Model Routing for Cost-Effective Reasoning
概要
arXiv:2605.06116v1 Announce Type: new Abstract: Inference-time computation has greatly enhanced the performance of large language models (LLMs) on challenging reasoning tasks, but this strategy can incur high inference costs. One solution is to route intermediate chain-of-thought (CoT) states to la…