04-16 PipeDream: Turning Pipeline Parallelism into a Practical Training System — Deep Technical Review
04-14 Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts — Deep Technical Review
04-11 Language Agent Tree Search (LATS): Unifying Reasoning, Acting, and Planning in Language Models — Deep Technical Review
04-10 SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression — Deep Technical Review
04-09 DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving — Deep Technical Review
04-08 SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models — In-Depth Technical Review
04-03 AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration — In-Depth Technical Review
03-27 GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection — In-Depth Technical Review
03-25 GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers — In-Depth Technical Review
03-09 Generative Agents: 25 AI Characters Living in a Simulated Town — Believable Human Behavior from LLMs
02-19 vLLM and PagedAttention: Efficient Memory Management for Large Language Model Serving — Technical Review
02-17 Direct Preference Optimization: Your Language Model Is Secretly a Reward Model — Technical Review