1. Why This Paper Matters
If you only remember one sentence from this review, I want it to be this:
SmoothQuant is important because it turns a seemingly annoying numerical issue—activation outliers—into a clean systems trick that real hardware can actually use.
Large language models are expensive for two reasons:
- they store a huge amount of weights, and
- they repeatedly move those weights and activations through matrix multiplications.
That means memory footprint, memory bandwidth, and integer-kernel friendliness are not side details. They are central engineering constraints.