How to Compress KV Cache in RL Post-Training? Shadow Mask Distillation for Memory-Efficient Alignment
概要
arXiv:2605.06850v1 Announce Type: cross Abstract: Reinforcement Learning (RL) has emerged as a crucial paradigm for unlocking the advanced reasoning capabilities of Large Language Models (LLMs), encompassing frameworks like RLHF and RLAIF. Regardless of the specific optimization algorithm (e.g., PP…