Hostname: page-component-77f85d65b8-5ngxj Total loading time: 0 Render date: 2026-03-29T16:30:09.347Z Has data issue: false hasContentIssue false

Compression efficiency analysis of AV1, VVC, and HEVC for random access applications

Published online by Cambridge University Press:  13 July 2021

Tung Nguyen*
Affiliation:
Department of Video Communication and Applications, Fraunhofer Institute for Telecommunications—Heinrich Hertz Institute, Berlin, Germany
Detlev Marpe
Affiliation:
Department of Video Communication and Applications, Fraunhofer Institute for Telecommunications—Heinrich Hertz Institute, Berlin, Germany
*
Corresponding author: Tung Nguyen Email: tung.nguyen@hhi.fraunhofer.de

Abstract

AOM Video 1 (AV1) and Versatile Video Coding (VVC) are the outcome of two recent independent video coding technology developments. Although VVC is the successor of High Efficiency Video Coding (HEVC) in the lineage of international video coding standards jointly developed by ITU-T and ISO/IEC within an open and public standardization process, AV1 is a video coding scheme that was developed by the industry consortium Alliance for Open Media (AOM) and that has its technological roots in Google's proprietary VP9 codec. This paper presents a compression efficiency evaluation for the AV1, VVC, and HEVC video coding schemes in a typical video compression application requiring random access. The latter is an important property, without which essential functionalities in digital video broadcasting or streaming could not be provided. For the evaluation, we employed a controlled experimental environment that basically follows the guidelines specified in the Common Test Conditions of the Joint Video Experts Team. As representatives of the corresponding video coding schemes, we selected their freely available reference software implementations. Depending on the application-specific frequency of random access points, the experimental results show averaged bit-rate savings of about 10–15% for AV1 and 36–37% for the VVC reference encoder implementation (VTM), both relative to the HEVC reference encoder implementation (HM) and by using a test set of video sequences with different characteristics regarding content and resolution. A direct comparison between VTM and AV1 reveals averaged bit-rate savings of about 25–29% for VTM, while the averaged encoding and decoding run times of VTM relative to those of AV1 are around 300% and 270%, respectively.

Keywords

Information

Type
Original Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © The Author(s), 2021. Published by Cambridge University Press in association with Asia Pacific Signal and Information Processing Association
Figure 0

Table 1. Distance in number of frames between two IRAP depending on the frame rate of the input sequence following the JVET CTC using GOP 32

Figure 1

Fig. 1. Diagram shows a Group-of-Pictures (GOP) structure of a size equal to eight. Each box shape denotes a picture, and the pictures within the dotted outline are forming the GOP. The first number in the angle brackets denotes the actual display order, while the second number denotes the coding or transmission order. B denotes a reference picture, whereas a non-capitalized b stands for a non-reference picture. Finally, the arrows denote the reference pictures for each picture. Note that the vertical arrangement of the boxes reflects the corresponding hierarchical temporal layering of the pictures.

Figure 2

Fig. 2. Diagram shows the GOP sequence for a 60 Hz video with an IRAP period configuration equal to 64 and a GOP size configuration equal to 16. When the keyframe of the GOP (marked as a box) is an IRAP picture, the corresponding GOP provides the random access point, which is the fifth GOP in this illustrated example. At the beginning of the transmission, the first GOP has the size equal to one, consisting of the keyframe only.

Figure 3

Fig. 3. Diagram shows a simplified GFG structure of a size equal to eight. Each box shape denotes a picture, and the number in the brackets denotes the time stamp of the corresponding picture. In the simplified illustration, the prediction structure is the same as the GOP structure of HM/VTM. However, instead of a reordering process and with the support of the AV1 syntax specification, the AV1 encoder transmits pictures before their designated display time and marks them as non-displayable. Below the GFG structure, the list of pictures denotes the transmission order for the same GFG, and cross-shaded box shapes denote pictures that are not displayed immediately. The white-shaded box shapes denote pictures that are reconstructed and displayed immediately, whereas gray-shaded box shapes denote pictures using the previously transmitted non-displayable pictures.

Figure 4

Table 2. Test sequences of the CTC test set that were used for the experiments in this paper

Figure 5

Table 3. Bit-rate savings of AV1 version 85a9314 relative to HM 16.21

Figure 6

Table 4. Bit-rate savings of VTM 8.0 relative to HM 16.21

Figure 7

Table 5. Bit-rate savings of VTM 8.0 relative to AV1

Figure 8

Table 6. Summarized results for an IRAP period configuration equals to approximately 2 s