RGB
You’ve heard of RGB, right?
It’s a pixel format that stands for Red, Green, and Blue and it’s
used to define a color using three values. Each pixel in a frame has an
RGB color associated to it, forming an image.
Well, most image and video codecs don’t use that. They use YUV
instead.
YUV
The YUV pixel formats split
the frame into three components, also called planes.
The first plane is the Y plane, also called the luma or
luminance plane. It represents roughly the greyscale component
of the frame (also known as black and white, but not quite, since it
is made up of shades of grey).
The second and third planes are the U and V planes, also called
the chroma or chrominance planes. They represent the colour
components of the frame.
There are many different YUV pixel formats depending on the chroma
subsampling, with the most common being: YUV444, YUV422, and
YUV420. We’ll see more about this later.
Pixels are just numerical values
Pixels are not some magical entity that defies human comprehension. Pixels are just… numbers.
Let’s take for example this small image of 8x8 pixels of a close-up
of Mario’s face.
The first row shows the three YUV components separately (in YUV444).
The second row shows the numbers that represent each YUV value of
each pixel of each component.
Value Range
In this example, I used the full range of one 8-bit byte (from 0 to
255) for all values, with the chroma values having an offset of
+128 (the chroma values would usually go from -128 to 127).
Note that, for the luma values, 0 means black, going up through
greyscale levels until it reaches 255, which means white.
This is the JPEG colour range.
It is also possible for YUV values to range from 16 to 235 for
luma values, and from -112 to 112 (or from 16 to 240) for
chroma values. This is the MPEG colour range.
Subsampling
The chroma planes may be subsampled, which means they will throw
away some colour information and encode fewer pixels than the luma
plane.
This loss of data from the colour planes is not a big problem since
our eyes are much more sensitive to greyscale than they are to
colour, so we barely notice it.
But subsampling doesn’t work too well with pixelated bitmaps with
sharp edges like Mario’s face, so in our next example we’ll be working
with Lena.
Now let’s subsample the chroma planes by half both horizontally
and vertically. The luma plane is of size 224x224, therefore
the chroma planes will be of size 112x112.
Now instead of transmitting 224 * 224 * 3 = 150528 pixels, we only
have to transmit 224 * 224 + 112 * 112 * 2 = 75264 pixels, which is
half the data.
Subsampling allows us to save a lot of data when transmitting images
and video.
But this does not mean that only a small chunk of the image will have
colour when displayed. Before the image is displayed again, the
chroma planes will go through the opposite of subsampling, which
is called upsampling (or interpolation).
And now we merge back the components to display them.
Pretty good, isn’t it?
Note that this process is not perfectly reversible: the colour
information that was thrown away while subsampling cannot be
recovered, not even through upsampling
YUV444
In YUV444, all three planes will have the same width and height
dimensions.
The following image shows the pixels encoded in the media file
for all three planes, starting at the top left corner from position
1,1, and ending at the bottom right corner at position H,W.
His the frameheightWis the framewidth
YUV422
In YUV422, the luma plane has the original width and height
dimensions, while the chroma planes are halved horizontally,
which means their dimensions will be (width/2) and height.
Each pixel in a chroma plane corresponds to two pixels
horizontally in the luma plane.
The following image shows the pixels encoded in the media file
for all three planes, starting at the top left corner from position
1,1, and ending at the bottom right corner at position H,W (for
the luma plane) and H,w (for the chroma planes).
His the frameheightWis the framewidthwiswidth/2(half the framewidth)
The following image shows the pixels for all three upsampled
planes, where they all have the same width and height (to form the
final frame).
Notice that the pixels in the chroma planes have been duplicated
horizontally.
YUV420
In YUV420, the luma plane has the original width and height
dimensions, while the chroma planes are halved both horizontally
and vertically,
which means their dimensions will be (width/2) and (height/2).
Each pixel in a chroma plane corresponds to two pixels
horizontally and two pixels vertically in the luma plane.
The following image shows the pixels encoded in the media file
for all three planes, starting at the top left corner from position
1,1, and ending at the bottom right corner at position H,W (for
the luma plane) and h,w (for the chroma planes).
His the frameheightWis the framewidthhisheight/2(half the frameheight)wiswidth/2(half the framewidth)
The following image shows the pixels for all three upsampled
planes, where they all have the same width and height (to form the
final frame).
Notice that the pixels in the chroma planes have been duplicated both
horizontally and vertically.