Regulating Branch Parallelism in LLM Serving
概要
arXiv:2605.06914v1 Announce Type: cross Abstract: Recent methods expose intra-request parallelism in LLM outputs, allowing independent branches to decode concurrently. Existing serving systems execute these branches eagerly or under fixed caps. We show that both are brittle: eager admission inflate…