DCT and Quantization
Instead of encoding each pixel of the image like a
bitmap file would,
some codecs will transform those pixels into the frequency domain using
the
Discrete Cosine Transform,
which gives out coefficients, then they will
quantize
those coefficients, and encode those quantized coefficients.
On the decoder size, those quantized coefficients will be
dequantized, and then transformed back into pixels from the
frequency domain using the IDCT (the Inverse DCT).
DCT
Note: You don’t really need to understand what the DCT is.
Just think of it like some kind of magical box that takes
pixels as input and creates coefficients as output.
But if you want at least a small introduction, keep reading this
section.
Remember that
pixels are just numbers?
Well, that’s pretty convenient, because the DCT works with numbers.
This is the result of running Mario’s face through the magical box
of DCT:
The numbers at the end are the DCT coefficients. The DCT coefficients are also not some magical entity that defies human comprehension. DCT coefficients are just… numbers.
It might be a little complicated to visualize the coefficients table
here. The table is colored using a false-color spectrum, where positive
numbers are red and negative numbers are blue.
(no need to understand this).
If we were to run those coefficients through the IDCT (the
Inverse DCT), we would get the same pixel values back, as you can see
here:
Now, let’s try to understand a little bit of what each coefficient
means.
The first coefficient on the top left corner is called the DC
coefficient. Its value ranges from 0 to 2040.
All other coefficients are referred to as AC coefficients.
Their values range from -1020 to 1020.
Let’s try leaving only the DC coefficient, and passing that
through the the IDCT:
Now let’s try leaving all AC coefficients, but removing the DC
coefficient, and passing that through the IDCT:
Notice that when we left only the DC coefficient, we got the
same value for all pixels in the output.
And when we left all AC coefficients, but removed the DC
coefficient, we could still see Mario’s face in the output, but the
entire image was darker than the original, and each pixel was darker
by subtracting the same value we got from the DC coefficient.
i.e.: In the first pixel, the original value was orig0, whereas
it was dc0 for the DC coefficient, and ac0 (ac0 = orig0 - dc0)
for the AC coefficients.
The reason for this is that the DC coefficient represents the
average value for all pixels. It does not give texture to the
block, but it does give the background color.
The AC coefficients, on the other hand, will each add or subtract
different values from a different set of pixels, adding finer and finer
details into the block (the texture).
Quantization
If we were to encode all the coefficients exactly as they were
output from the DCT, there wouldn’t really be much benefit at all
from all this mathemagic.
The final output file would be almost as big (or just as big, or even
bigger) than the original uncompressed file, and the codec would have
done a pretty poor job.
So why do codecs go through all this trouble of transforming to the frequency domain?
Well, it turns out you can do even more magic in the frequency
domain by tweaking the coefficient values quite a bit and still ending
up with a relatively good reconstruction of the block
after running them through the IDCT.
And the way we do this is with quantization. Each coefficient is divided by a certain number, and is then rounded to the nearest integer.
For example, let’s divide all our coefficients by q, and we’ll
get:
Wow! Those values are much smaller. There are also a bunch of zeros
in there. Now let’s pretend we’re the decoder, and dequantize
those values by multiplying them all by q again.
They are all pretty close to the original coefficients.
Now let’s run that through the IDCT and look at the reconstructed
block.
That still looks a lot like Mario’s face, doesn’t it? Let’s have a closer look at the difference between the original and the reconstructed image, side by side:
It’s hard to notice the difference, right? Let’s have a closer look at the difference in values:
(so small)
You can use the slider below to change the quantizer value, and this entire page will be updated with the new value.
quantizer: 16
Quantization tables
In reality, instead of using a single quantizer value for all
coefficients like we did in the section above, codecs use a
quantization table.
Since our eyes are better at perceiving low frequency changes over
high frequency changes, the quantizer values nearest to the DC
coefficient will be smaller, while the quantizer values nearest to the
last AC coefficient will be greater (roughly speaking).
The MPEG-2 standard specifies default quantization tables for intra
blocks and non-intra blocks (with no distinction between luma and
chroma):
The MPEG-4 standard also specifies default quantization tables for
intra blocks and non-intra blocks (also with no distinction between
luma and chroma):
The JPEG standard gives examples of luma and chroma quantization
tables that are “based on psychovisual thresholding and are derived
empirically using luminance and chrominance and 2:1 horizontal
subsampling” (that’s some fancy wording there…):
Now let’s take Mario’s face’s coefficients, and quantize them
using the example luma JPEG quantization table from above:
Then we dequantize those quantized coefficients:
And finally, we pass the reconstructed coefficients through the
IDCT, and we get:
(not bad, right? also, not very good either…)
Those quantization tables in the standards are mere suggestions. The encoder is free to create the quantization table it wants, allowing for greater or lower quality. In fact, for most encoders, when you change the quality settings, the encoder is just changing the quantization table behind the scenes.
There are 8 x 8 = 64 values in total in a quantization table,
and they can each go from 1 to 255.
That’s a whopping 25464 possibilities, or
8.116×10153.
Google even went as far as creating a whole
new JPEG encoder (called Guetzli)
that basically tweaks the quantization table up to a point where
your puny little eyes can barely notice that anything changed at all,
while at the same time achieving very high levels of compression.