04-08 SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models — In-Depth Technical Review
04-03 AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration — In-Depth Technical Review