Hostname: page-component-6766d58669-tq7bh Total loading time: 0 Render date: 2026-05-16T06:57:44.731Z Has data issue: false hasContentIssue false

Machine learning for media compression: challenges and opportunities

Published online by Cambridge University Press:  11 September 2018

Amir Said*
Affiliation:
Qualcomm Technologies Inc., San Diego, CA, USA
*
Corresponding author: A. Said Email: said@ieee.org

Abstract

Machine learning (ML) has been producing major advances in several technological fields and can have a significant impact on media coding. However, fast progress can only happen if the ML techniques are adapted to match the true needs of compression. In this paper, we analyze why some straightforward applications of ML tools to compression do not really address its fundamental problems, which explains why they have been yielding disappointing results. From an analysis of why compression can be quite different from other ML applications, we present some new problems that are technically challenging, but that can produce more significant advances. Throughout the paper, we present examples of successful applications to video coding, discuss practical difficulties that are specific to media compression, and describe related open research problems.

Information

Type
Industrial Technology Advances
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (http://creativecommons.org/licenses/by-ncnd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use or in order to create a derivative work.
Copyright
Copyright © The Authors, 2018
Figure 0

Fig. 1. A general prediction system, including the elements used for training.

Figure 1

Fig. 2. A prediction-based system for lossy media compression.

Figure 2

Fig. 3. Comparison of how (a) a general prediction system and (b) a prediction-based encoder, can use several prediction methods to improve performance. In the encoder, the residual is the difference between prediction and original signals.

Figure 3

Fig. 4. Inclusion of multiple transform options to the encoder of Fig. 3(b).

Figure 4

Fig. 5. Parametric multi-pass transform of dimension N: the computational complexity (memory and operations) in each pass is O(N).

Figure 5

Fig. 6. Similar optimization (learning) structures of neural networks and parametric multi-pass transform. Green arrows represent processed data, and red arrows represent partial derivatives.

Figure 6

Table 1. Comparison of features of Neural Networks (NN) and Parametric Multi-Pass Transforms (PMPT)

Figure 7

Fig. 7. Examples of how a visualization tool for prediction can lead to the design of better predictors. (a) represents the original “human based” design, (b) the ML design, but without complexity taken into account, and (c) is the “mixed” low-complexity design, with a model inspired on the predictors visually observed in (b), and ML for parameter optimization.

Figure 8

Fig. 8. Examples of how repeated video parts can produce very severe over-fitting of predictors.

Figure 9

Fig. 9. Using training media for codec parameter optimization: (a) direct optimization, and (b) offline optimization needed when encoder complexity and data volumes are too large.