DCT and Quantization
Instead of encoding each pixel of the image like a
bitmap
file would,
some codecs will transform those pixels into the frequency domain using
the
Discrete Cosine Transform
,
which gives out coefficients, then they will
quantize
those coefficients, and encode those quantized coefficients.
On the decoder size, those quantized coefficients will be
dequantized
, and then transformed back into pixels from the
frequency domain using the IDCT
(the Inverse DCT
).
DCT
Note: You don’t really need to understand what the DCT
is.
Just think of it like some kind of magical box that takes
pixels as input and creates coefficients as output.
But if you want at least a small introduction, keep reading this
section.
Remember that
pixels are just numbers?
Well, that’s pretty convenient, because the DCT
works with numbers.
This is the result of running Mario’s face through the magical box
of DCT
:
The numbers at the end are the DCT coefficients. The DCT coefficients are also not some magical entity that defies human comprehension. DCT coefficients are just… numbers.
It might be a little complicated to visualize the coefficients table
here. The table is colored using a false-color spectrum, where positive
numbers are red
and negative numbers are blue
.
(no need to understand this).
If we were to run those coefficients through the IDCT
(the
Inverse DCT
), we would get the same pixel values back, as you can see
here:
Now, let’s try to understand a little bit of what each coefficient
means.
The first coefficient on the top left corner is called the DC
coefficient. Its value ranges from 0
to 2040
.
All other coefficients are referred to as AC
coefficients.
Their values range from -1020
to 1020
.
Let’s try leaving only the DC
coefficient, and passing that
through the the IDCT
:
Now let’s try leaving all AC
coefficients, but removing the DC
coefficient, and passing that through the IDCT
:
Notice that when we left only the DC
coefficient, we got the
same value for all pixels in the output.
And when we left all AC
coefficients, but removed the DC
coefficient, we could still see Mario’s face in the output, but the
entire image was darker than the original, and each pixel was darker
by subtracting the same value we got from the DC
coefficient.
i.e.: In the first pixel, the original value was orig0
, whereas
it was dc0
for the DC
coefficient, and ac0
(ac0
= orig0
- dc0
)
for the AC
coefficients.
The reason for this is that the DC
coefficient represents the
average
value for all pixels. It does not give texture
to the
block, but it does give the background color
.
The AC
coefficients, on the other hand, will each add or subtract
different values from a different set of pixels, adding finer and finer
details into the block (the texture
).
Quantization
If we were to encode all the coefficients exactly as they were
output from the DCT
, there wouldn’t really be much benefit at all
from all this mathemagic.
The final output file would be almost as big (or just as big, or even
bigger) than the original uncompressed file, and the codec would have
done a pretty poor job.
So why do codecs go through all this trouble of transforming to the frequency domain?
Well, it turns out you can do even more magic in the frequency
domain by tweaking the coefficient values quite a bit and still ending
up with a relatively good reconstruction of the block
after running them through the IDCT
.
And the way we do this is with quantization. Each coefficient is divided by a certain number, and is then rounded to the nearest integer.
For example, let’s divide all our coefficients by q
, and we’ll
get:
Wow! Those values are much smaller. There are also a bunch of zeros
in there. Now let’s pretend we’re the decoder, and dequantize
those values by multiplying them all by q
again.
They are all pretty close to the original coefficients.
Now let’s run that through the IDCT
and look at the reconstructed
block.
That still looks a lot like Mario’s face, doesn’t it? Let’s have a closer look at the difference between the original and the reconstructed image, side by side:
It’s hard to notice the difference, right? Let’s have a closer look at the difference in values:
(so small)
You can use the slider below to change the quantizer value, and this entire page will be updated with the new value.
quantizer: 16
Quantization tables
In reality, instead of using a single quantizer value for all
coefficients like we did in the section above, codecs use a
quantization table
.
Since our eyes are better at perceiving low frequency changes over
high frequency changes, the quantizer values nearest to the DC
coefficient will be smaller, while the quantizer values nearest to the
last AC
coefficient will be greater (roughly speaking).
The MPEG-2
standard specifies default quantization tables for intra
blocks and non-intra
blocks (with no distinction between luma
and
chroma
):
The MPEG-4
standard also specifies default quantization tables for
intra
blocks and non-intra
blocks (also with no distinction between
luma
and chroma
):
The JPEG
standard gives examples of luma
and chroma
quantization
tables that are “based on psychovisual thresholding and are derived
empirically using luminance and chrominance and 2:1 horizontal
subsampling” (that’s some fancy wording there…):
Now let’s take Mario’s face’s coefficients, and quantize
them
using the example luma
JPEG
quantization table from above:
Then we dequantize
those quantized coefficients:
And finally, we pass the reconstructed coefficients through the
IDCT
, and we get:
(not bad, right? also, not very good either…)
Those quantization tables in the standards are mere suggestions. The encoder is free to create the quantization table it wants, allowing for greater or lower quality. In fact, for most encoders, when you change the quality settings, the encoder is just changing the quantization table behind the scenes.
There are 8
x 8
= 64
values in total in a quantization table,
and they can each go from 1
to 255
.
That’s a whopping 254
64 possibilities, or
8.116
×10
153.
Google even went as far as creating a whole
new JPEG
encoder (called Guetzli)
that basically tweaks the quantization table up to a point where
your puny little eyes can barely notice that anything changed at all,
while at the same time achieving very high levels of compression.