Saliency-Aware Regularized Quantization Calibration for Large Language Models
概要
arXiv:2605.05693v1 Announce Type: new Abstract: Post-training quantization (PTQ) is an effective approach for deploying large language models (LLMs) under memory and latency constraints. Most existing PTQ methods determine quantization parameters by minimizing a layer-wise reconstruction error on a…