02-19 vLLM and PagedAttention: Efficient Memory Management for Large Language Model Serving — Technical Review