Zig-Zag and Run-Length Encoding

From the DCT and Quantization overview, we learned that some codecs will transform the pixels into the frequency domain using the Discrete Cosine Transform and then quantize those coefficients.

But the quantized coefficients are two-dimensional, in an 8x8 block, and any transmission or storage of the stream must be one-dimensional. Therefore, the codec needs to flatten the quantized coefficients from a two-dimensional block of 8x8 values into a one-dimensional sequence.

The JPEG and MPEG codecs do this by using zig-zag scan and run-length encoding, as we will see below.

Our sample 8x8 block for this page will be the Mona Lisa. First, let’s convert it to YUV:

Now, let’s take the luma plane and run it through the DCT:

And finally, let’s quantize the coefficients using the example luma JPEG quantization table:

These are the quantized coefficients we’ll be working with in this page.

Zig-Zag

First we will flatten the 8x8 block of quantized coefficients into a one-dimensional sequence of 64 values.

Instead of taking a naïve approach and just encoding each row one at a time, the codec creators came up with a clever way of flattening the block, called a zig-zag sequence:

The image above shows an index table, where each element in the 8x8 table is given an index (from 0 to 63). The green line shows the order in which those elements will be laid out in the one-dimensional sequence, starting at the top left corner, and ending at the bottom right corner. This is the resulting sequence:

To understand why this makes sense, let’s take a closer look at our quantized coefficients again, but now with the zeros highlighted in black:

If you take a closer look at how those zeros are positioned in the table, you will see that they mostly happen on the bottom right side of the table (especially in parts of the image with less texture), and that they mostly appear across diagonal lines, so that when we flatten the block, we start with the bigger values, and get most of the zeros in long sequences, like this:

But why is it interesting to have the zeros in long sequences? To answer this question, let’s have a look at the next clever technique, called Run-Length Encoding (RLE for short).

Run-Length Encoding

Run-length encoding involves simplifying a sequence of values by encoding a value and the amount of times that value occurs consecutively in the sequence, also known as the run (note: this is unrelated to having “the runs”).

For example, the sequence '00011112203333' could be encoded as '0'×3, '1'×4, '2'×2, '0'×1, '3'×4.

The JPEG and MPEG codecs only encode runs of coefficients that are zeros, and not of any other values. Using the same example above, the sequence would be encoded as '0'×3, '1111', '22', '0'×1, '3333'. Why aren’t other values encoded using RLE? Because the coefficients vary so much from one to the next that it wouldn’t help compression if everything was encoded using RLE. But the sequences of zeros occur frequently enough to justify using RLE on them.

Also, the JPEG and MPEG codecs always encode runs of zeros, even between two non-zero coefficients. The result is that each element in the output sequence is encoded as <count of preceding zero coefficients, non-zero coefficient>. Using the same example above, the sequence would be encoded as <3, '1111'>, <0, '22'>, <1, '3333'>.

Let’s have a look at what our Mona-Lisa run-length encoded sequence would look like with this technique:

Much shorter, right? We’re down from 64 elements to p elements.

There are a few more tweaks to improve the Run-Length Encoding.

The last element contains a lot of zeros. But it doesn’t really matter how many zeros are in the last run, since the block will end anyways. So let’s just write an element that says end-of-block (EOB for short).

Note: if the last element is non-zero, we don’t need to encode the end-of-block, since the block is complete.

The DC coefficient (remember: that’s the first coefficient) is a bit different from the rest. First of all, it has a different range from the AC coefficients (DCT and Quantization overview). Also, it will always have a run of 0, since it is the first element, so we don’t need to include the run in the code. So let’s encode it differently:

Now we have three different types of codes:

  • <DC, DC Coefficient>
  • <run of zeros, AC Coefficient>
  • end-of-block

You might be asking yourself but how will those codes be represented in the bitstream? Or not… You’re probably not asking yourself that. But let’s pretend you did ask that.

Well, that’s a very good question. Go on to Huffman Coding overview to see how those codes will be finally encoded in the output.