Zig-Zag and Run-Length Encoding
From the DCT and Quantization
overview, we learned that
some codecs will transform the pixels into the frequency domain using
the
Discrete Cosine Transform
and then
quantize
those coefficients.
But the quantized coefficients are two-dimensional
, in an 8x8
block, and any transmission or storage of the stream must be
one-dimensional
.
Therefore, the codec needs to flatten
the quantized coefficients
from a two-dimensional
block of 8x8
values into a one-dimensional
sequence.
The JPEG
and MPEG
codecs do this by using zig-zag
scan and
run-length encoding
, as we will see below.
Our sample 8x8
block for this page will be the
Mona Lisa.
First, let’s convert it to YUV
:
Now, let’s take the luma
plane and run it through the DCT
:
And finally, let’s quantize
the coefficients using the example
luma
JPEG
quantization table:
These are the quantized coefficients we’ll be working with in this page.
Zig-Zag
First we will flatten
the 8x8
block of quantized coefficients
into a one-dimensional
sequence of 64
values.
Instead of taking a naïve approach and just encoding each row one at
a time, the codec creators came up with a clever way of flattening
the block, called a zig-zag
sequence:
The image above shows an index table, where each element
in the
8x8
table is given an index
(from 0
to 63
).
The green line shows the order in which those element
s will be laid
out in the one-dimensional
sequence, starting at the top left corner,
and ending at the bottom right corner. This is the resulting sequence:
To understand why this makes sense, let’s take a closer look at our
quantized coefficients again, but now with the zeros
highlighted
in black:
If you take a closer look at how those zeros
are positioned in the
table, you will see that they mostly happen on the bottom right side of
the table (especially in parts of the image with less texture
), and
that they mostly appear across diagonal lines, so that when we
flatten
the block, we start with the bigger values, and get most of
the zeros
in long sequences, like this:
But why is it interesting to have the zeros
in long sequences?
To answer this question, let’s have a look at the next clever
technique, called Run-Length Encoding (RLE
for short).
Run-Length Encoding
Run-length encoding involves simplifying a sequence of values by
encoding a value and the amount of times that value occurs
consecutively in the sequence, also known as the run (note: this is
unrelated to having
“the runs
”).
For example, the sequence '00011112203333'
could be encoded as
'0'×3
, '1'×4
, '2'×2
, '0'×1
, '3'×4
.
The JPEG
and MPEG
codecs only encode runs of coefficients that
are zeros
, and not of any other values. Using the same example above,
the sequence would be encoded as '0'×3
, '1111'
, '22'
, '0'×1
,
'3333'
. Why aren’t other values encoded using RLE
?
Because the coefficients vary so much from one to the next that it
wouldn’t help compression if everything was encoded using RLE
.
But the sequences of zeros
occur frequently enough to justify using
RLE
on them.
Also, the JPEG
and MPEG
codecs always encode runs of zeros
,
even between two non-zero
coefficients.
The result is that each element
in the output sequence is encoded as
<count of preceding zero coefficients, non-zero coefficient>
.
Using the same example above, the sequence would be encoded as
<3, '1111'>
, <0, '22'>
, <1, '3333'>
.
Let’s have a look at what our Mona-Lisa run-length encoded sequence would look like with this technique:
Much shorter, right? We’re down from 64
elements to p
elements.
There are a few more tweaks to improve the Run-Length Encoding.
The last element contains a lot of zeros. But it doesn’t really matter
how many zeros are in the last run, since the block will end anyways.
So let’s just write an element that says end-of-block
(EOB
for short).
Note: if the last element is non-zero, we don’t need to encode the
end-of-block
, since the block is complete.
The DC
coefficient (remember: that’s the first coefficient) is a bit
different from the rest. First of all, it has a different range from
the AC
coefficients (DCT and Quantization
overview).
Also, it will always have a run
of 0
, since it is the first
element, so we don’t need to include the run
in the code.
So let’s encode it differently:
Now we have three different types of codes:
<DC, DC Coefficient>
<run of zeros, AC Coefficient>
end-of-block
You might be asking yourself but how will those codes be represented in the bitstream? Or not… You’re probably not asking yourself that. But let’s pretend you did ask that.
Well, that’s a very good question.
Go on to Huffman Coding
overview to see how those codes
will be finally encoded in the output.