MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems
Tutti:让基于 SSD 的 KV Cache 真正适用于长上下文 LLM Serving
Tutti: Making SSD-Backed KV Cache Practical for Long-Context LLM Serving
Queueing Stability for LLM Inference with KV Cache Memory Constraints
Swift-SVD: Activation-Aware Low-Rank Compression for LLM Weights and KV Cache
Piper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid Parallelism
Low-Rank Optimization Trajectories for LLM RLVR Acceleration: A Technical Review of NExt
FEPLB Technical Review: Nearly Free MoE Load Balancing with the NVLink Copy Engine
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond — Technical Review
Author: Zhongzhu Zhou
Paper: Chu et al., 2026. arXiv:2604.22748 [cs.AI]
Date: April 27, 2026
Direction: Monday, April 27 — Agent/LLM Quality Generation
Pages: 10
Executive Summary
As AI systems evolve from text generators to goal-achieving agents that interact with complex environments, predicting environment dynamics has become the central bottleneck. This comprehensive survey paper provides a unified framework for understanding world models—internal representations that agents use to anticipate consequences of their actions and plan accordingly.
The paper introduces a elegant "levels × laws" taxonomy:
- Three capability levels (L1 Predictor → L2 Simulator → L3 Evolver) define what a world model can do
- Four governing-law regimes (physical, digital, social, scientific) define the constraints it must satisfy
By synthesizing over 400 papers across model-based RL, video generation, web/GUI agents, multi-agent simulation, and AI-driven science, the authors reveal a fragmented landscape where "world model" means different things to different communities. Their framework provides the common language needed to align these communities.
OGER: A Robust Offline-Guided Exploration Reward for Hybrid Reinforcement Learning
1. Executive Summary
OGER (Offline-Guided Exploration Reward) introduces a novel framework for enhancing Large Language Model (LLM) reasoning by seamlessly integrating offline teacher trajectories with online reinforcement learning. The key innovation lies in positioning offline data as a semantic reference point for computing auxiliary exploration rewards, rather than treating it as additional training samples.
The framework addresses critical limitations in current RLVR (Reinforcement Learning with Verifiable Rewards) approaches: the "echo chamber" effect where models converge to dominant pre-existing distributions, and entropy collapse that prevents novel solution discovery. By computing divergence-based exploration rewards and refining them through entropy-aware modulation, OGER achieves 4-7.9% improvements across mathematical and general reasoning benchmarks.
Generalization at the Edge of Stability: A Random Dynamical Systems Perspective
1. What This Paper Does
Core Problem
The edge of stability phenomenon, discovered by Cohen et al. (2021), presents a theoretical puzzle: when training with sufficiently large learning rates η, the largest Hessian eigenvalue λ₁ frequently exceeds the stability threshold 2/η, implying the system should diverge according to classical optimization theory. Yet empirically:
- Training loss continues to decrease
- Model generalization often improves in this regime
- The optimizer doesn't settle at a point but explores a bounded, chaotic set
Prior explanations relying on pointwise properties (Hessian trace, spectral norm) fail to capture this phenomenon because they ignore the ensemble behavior of the attractor set.
Main Contribution
The paper's central insight: characterize generalization through the geometric properties of the random attractor itself, not individual solutions.
They prove that:
- Sharpness Dimension (SD) < ambient dimension d with high probability at EoS
- Worst-case generalization error depends on SD, not parameter count d
- The complete Hessian spectrum structure matters, not just the trace or largest eigenvalue
- The attractor forms a fractal set with intrinsic dimension strictly smaller than the parameter space
This explains why overparameterized models generalize: the training dynamics naturally compress into a lower-dimensional manifold despite the high-dimensional parameter space.