DCT and Quantization

Instead of encoding each pixel of the image like a bitmap file would, some codecs will transform those pixels into the frequency domain using the Discrete Cosine Transform, which gives out coefficients, then they will quantize those coefficients, and encode those quantized coefficients.

On the decoder size, those quantized coefficients will be dequantized, and then transformed back into pixels from the frequency domain using the IDCT (the Inverse DCT).

DCT

Note: You don’t really need to understand what the DCT is. Just think of it like some kind of magical box that takes pixels as input and creates coefficients as output. But if you want at least a small introduction, keep reading this section.

Mathematical!

Remember that pixels are just numbers? Well, that’s pretty convenient, because the DCT works with numbers. This is the result of running Mario’s face through the magical box of DCT:

The numbers at the end are the DCT coefficients. The DCT coefficients are also not some magical entity that defies human comprehension. DCT coefficients are just… numbers.

It might be a little complicated to visualize the coefficients table here. The table is colored using a false-color spectrum, where positive numbers are red and negative numbers are blue. (no need to understand this).

If we were to run those coefficients through the IDCT (the Inverse DCT), we would get the same pixel values back, as you can see here:

Now, let’s try to understand a little bit of what each coefficient means. The first coefficient on the top left corner is called the DC coefficient. Its value ranges from 0 to 2040. All other coefficients are referred to as AC coefficients. Their values range from -1020 to 1020.

Let’s try leaving only the DC coefficient, and passing that through the the IDCT:

Now let’s try leaving all AC coefficients, but removing the DC coefficient, and passing that through the IDCT:

Notice that when we left only the DC coefficient, we got the same value for all pixels in the output. And when we left all AC coefficients, but removed the DC coefficient, we could still see Mario’s face in the output, but the entire image was darker than the original, and each pixel was darker by subtracting the same value we got from the DC coefficient.

i.e.: In the first pixel, the original value was orig0, whereas it was dc0 for the DC coefficient, and ac0 (ac0 = orig0 - dc0) for the AC coefficients.

The reason for this is that the DC coefficient represents the average value for all pixels. It does not give texture to the block, but it does give the background color. The AC coefficients, on the other hand, will each add or subtract different values from a different set of pixels, adding finer and finer details into the block (the texture).

Quantization

If we were to encode all the coefficients exactly as they were output from the DCT, there wouldn’t really be much benefit at all from all this mathemagic. The final output file would be almost as big (or just as big, or even bigger) than the original uncompressed file, and the codec would have done a pretty poor job.

So why do codecs go through all this trouble of transforming to the frequency domain?

Well, it turns out you can do even more magic in the frequency domain by tweaking the coefficient values quite a bit and still ending up with a relatively good reconstruction of the block after running them through the IDCT.

And the way we do this is with quantization. Each coefficient is divided by a certain number, and is then rounded to the nearest integer.

For example, let’s divide all our coefficients by q, and we’ll get:

Wow! Those values are much smaller. There are also a bunch of zeros in there. Now let’s pretend we’re the decoder, and dequantize those values by multiplying them all by q again.

They are all pretty close to the original coefficients. Now let’s run that through the IDCT and look at the reconstructed block.

That still looks a lot like Mario’s face, doesn’t it? Let’s have a closer look at the difference between the original and the reconstructed image, side by side:

It’s hard to notice the difference, right? Let’s have a closer look at the difference in values:

(so small)

You can use the slider below to change the quantizer value, and this entire page will be updated with the new value.

quantizer: 16

Quantization tables

In reality, instead of using a single quantizer value for all coefficients like we did in the section above, codecs use a quantization table.

Since our eyes are better at perceiving low frequency changes over high frequency changes, the quantizer values nearest to the DC coefficient will be smaller, while the quantizer values nearest to the last AC coefficient will be greater (roughly speaking).

The MPEG-2 standard specifies default quantization tables for intra blocks and non-intra blocks (with no distinction between luma and chroma):

The MPEG-4 standard also specifies default quantization tables for intra blocks and non-intra blocks (also with no distinction between luma and chroma):

The JPEG standard gives examples of luma and chroma quantization tables that are “based on psychovisual thresholding and are derived empirically using luminance and chrominance and 2:1 horizontal subsampling” (that’s some fancy wording there…):

Now let’s take Mario’s face’s coefficients, and quantize them using the example luma JPEG quantization table from above:

Then we dequantize those quantized coefficients:

And finally, we pass the reconstructed coefficients through the IDCT, and we get:

(not bad, right? also, not very good either…)

Those quantization tables in the standards are mere suggestions. The encoder is free to create the quantization table it wants, allowing for greater or lower quality. In fact, for most encoders, when you change the quality settings, the encoder is just changing the quantization table behind the scenes.

There are 8 x 8 = 64 values in total in a quantization table, and they can each go from 1 to 255. That’s a whopping 254⁶⁴ possibilities, or 8.116×10¹⁵³.

Google even went as far as creating a whole new JPEG encoder (called Guetzli) that basically tweaks the quantization table up to a point where your puny little eyes can barely notice that anything changed at all, while at the same time achieving very high levels of compression.