Archive | Zhongzhu's Blog

0%

Good! 106 posts in total. Keep on posting.

2026

05-11

MASPO：面向 LLM 多智能体系统的联合提示词优化

05-11

MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems

05-10

Tutti：让基于 SSD 的 KV Cache 真正适用于长上下文 LLM Serving

05-10

Tutti: Making SSD-Backed KV Cache Practical for Long-Context LLM Serving

05-09

Queueing Stability for LLM Inference with KV Cache Memory Constraints

05-08

Swift-SVD: Activation-Aware Low-Rank Compression for LLM Weights and KV Cache

05-07

Piper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid Parallelism

05-01

Low-Rank Optimization Trajectories for LLM RLVR Acceleration: A Technical Review of NExt

04-29

FEPLB Technical Review: Nearly Free MoE Load Balancing with the NVLink Copy Engine

04-27

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond — Technical Review

04-26

OGER: A Robust Offline-Guided Exploration Reward for Hybrid Reinforcement Learning

04-24

Generalization at the Edge of Stability: A Random Dynamical Systems Perspective

04-24

FEPLB: Zero-Cost MoE Load Balancing via NVLink Copy Engine

04-22

SAGE: Training-Free Semantic Evidence Composition for Edge-Cloud Inference Under Hard Uplink Budgets

04-19

SpecGuard: Verification-Aware Speculative Decoding for Efficient Multi-Step Reasoning

04-19

SpecGuard：用于多步推理的验证感知推测解码

04-17

GRASP Technical Review: Replacing Redundant LLM Layers with Adaptive Singular Parameters

04-16

PipeDream: Turning Pipeline Parallelism into a Practical Training System — Deep Technical Review

04-16

PipeDream：把 Pipeline Parallelism 做成真正可训练系统——深度阅读笔记

04-15

LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding — Deep Technical Review

04-15

LayerSkip：让大模型“提前退出 + 自校验推理”成为可部署方案——深度阅读笔记

04-14

Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts — Deep Technical Review

04-14

ArmoRM：用“多目标奖励建模 + 混合专家门控”做可解释偏好学习——深度阅读笔记

04-13

Toolformer: Language Models Can Teach Themselves to Use Tools — Deep Technical Review

04-13

Toolformer：让语言模型自己学会“什么时候调用工具”——深度阅读笔记

04-12

Voyager: An Open-Ended Embodied Agent with Large Language Models — Deep Technical Review

04-12

Voyager：一个能在 Minecraft 中持续成长的 LLM 具身智能体 —— 深度阅读笔记

04-11

Language Agent Tree Search (LATS): Unifying Reasoning, Acting, and Planning in Language Models — Deep Technical Review

04-11

LATS（Language Agent Tree Search）：把推理、行动、规划统一到同一个语言模型代理框架里 — 深度阅读笔记

04-10

SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression — Deep Technical Review

04-10

SVD-LLM：面向大语言模型压缩的“截断感知”奇异值分解方法 — 深度阅读笔记

04-09

DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving — Deep Technical Review

04-09

DistServe：通过 Prefill/Decoding 解耦实现面向 Goodput 的大模型服务优化 — 深度阅读笔记

04-08

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models — In-Depth Technical Review

04-08

SmoothQuant：大型语言模型的精准高效训练后量化 — 深度阅读笔记

04-07

ORPO: Monolithic Preference Optimization without Reference Model — In-Depth Technical Review

04-07

ORPO：不用参考模型的一体化偏好优化 — 深度阅读笔记

04-04

Switch Transformers: Scaling to Trillion-Parameter Sparse Models — In-Depth Technical Review

04-04

Switch Transformers：用简单高效的稀疏性扩展到万亿参数模型 — 深度阅读笔记

04-03

AWQ：感知激活值的大模型权重量化压缩与加速 — 深度阅读笔记

04-03

AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration — In-Depth Technical Review

04-02

GPipe: Easy Scaling with Micro-Batch Pipeline Parallelism — In-Depth Technical Review

04-02

GPipe：微批次流水线并行的大规模模型训练 — 深度阅读笔记

04-01

Layer Pruning for Efficient Large Language Models — In-Depth Technical Review

03-31

Constitutional AI: Harmlessness from AI Feedback — In-Depth Technical Review

03-30

Chain-of-Thought Prompting Elicits Reasoning in LLMs — In-Depth Technical Review

03-29

Ring Attention: Blockwise Transformers for Near-Infinite Context — In-Depth Technical Review

03-28

Mamba: Linear-Time Sequence Modeling with Selective State Spaces — In-Depth Technical Review

03-27

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection — In-Depth Technical Review

03-26

Alpa: Automating Inter- and Intra-Operator Parallelism — In-Depth Technical Review

03-25

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers — In-Depth Technical Review

03-24

Proximal Policy Optimization Algorithms — In-Depth Technical Review

03-24

近端策略优化算法（PPO）— 深度阅读笔记

03-23

MiRA: A Subgoal-driven Framework for Improving Long-Horizon LLM Agents — Technical Review

03-22

Attention Is All You Need: The Transformer — In-Depth Technical Review

03-21

BitNet: Scaling 1-bit Transformers for Large Language Models — In-Depth Technical Review

03-19

ZeRO: Shattering the Memory Wall — How DeepSpeed Trains Trillion-Parameter Models

03-16

MetaGPT: When LLM Agents Form a Software Company — Multi-Agent Collaboration Done Right

03-14

FlashAttention: The IO-Aware Algorithm That Made Transformers Actually Fast

03-13

LoRA: Fine-Tuning Giant Models with Pocket Change — The Low-Rank Revolution

03-12

Megatron-LM: NVIDIA's Blueprint for Training Billion-Parameter Models at Scale

03-12

PaRO: Smarter Partitioning for Distributed Training — Beyond ZeRO's One-Size-Fits-All

03-11

Speculative Decoding: Making LLM Inference 2-3× Faster Without Losing a Single Token

03-10

InstructGPT: The RLHF Recipe That Turned GPT-3 Into a Helpful Assistant

03-09

AutoGen: Microsoft's Framework for Building Multi-Agent Conversations That Actually Work

03-09

Generative Agents: 25 AI Characters Living in a Simulated Town — Believable Human Behavior from LLMs

03-06

SWE-agent: Turning LLMs Into Autonomous Software Engineers That Fix Real GitHub Issues

02-23

Self-Refine: Teaching LLMs to Critique and Improve Their Own Output — No Extra Training Needed

02-20

DeepSeekMath: How 120B Tokens of Math Data and GRPO Rival GPT-4 on Competition Problems

02-20

Reflexion: LLM Agents That Learn from Failure Through Verbal Self-Reflection

02-19

AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning — Technical Review

02-19

vLLM and PagedAttention: Efficient Memory Management for Large Language Model Serving — Technical Review

02-18

GLM-5 Technical Review: From Vibe Coding to Agentic Engineering

02-18

DeepSeek-V2: Multi-head Latent Attention and DeepSeekMoE — Technical Review

02-17

Direct Preference Optimization: Your Language Model Is Secretly a Reward Model — Technical Review

02-16

Tree of Thoughts: Deliberate Problem Solving with Large Language Models — Technical Review

02-09

ReAct Technical Review: From Reasoning Ability to Executable Reasoning

2023

04-29

ComputerArchitecture-Day1

2022

02-03

Reinforcement Learning-Principle-Day12

01-28

极路由S1-无官方破解路径下保姆级教程，辛酸刷机历程

2021

11-29

现代操作系统原理与实现-陈海波-Day 1

11-22

Intel mac to M1 chip mac

11-14

Reinforcement Learning-Principle-Day11

11-07

Reinforcement Learning-Principle-Day10

11-04

MetaLearning-Standford-Lecture5

10-31

Reinforcement Learning-Principle-Day9

10-20

Reinforcement Learning-Principle-Day8

10-13

Reinforcement Learning-Principle-Day7

09-29

Operating System Memory Address

09-29

Reinforcement Learning-Principle-Day6

07-21

Reinforcement Learning-Principle-Day5

04-14

MetaLearning-Standford-Lecture4

03-05

Reinforcement Learning Principle Day4

2020

12-02

Reinforcement Learning-Principle-Day3

11-25

HHKB's BS and Delete 按钮引起的疑惑

11-12

MetaLearning-Standford-Lecture3

11-04

MetaLearning-Standford-Lecture2

10-30

Reinforcement Learning-Principle-Day2

09-25

09-09

09-05

09-05

09-04

08-23

Reinforcement Learning-Principle-Day1

2019

11-26

Tensorflow-Day1-DNN Explain

11-24

Reinforcement Learning_WatermelonBook_Summary