Hostname: page-component-89b8bd64d-j4x9h Total loading time: 0 Render date: 2026-05-09T04:53:19.791Z Has data issue: false hasContentIssue false

Deep-learning-based macro-pixel synthesis and lossless coding of light field images

Published online by Cambridge University Press:  17 July 2019

Ionut Schiopu*
Affiliation:
Department of Electronics and Informatics (ETRO), Vrije Universiteit Brussel (VUB), Brussels, Belgium
Adrian Munteanu
Affiliation:
Department of Electronics and Informatics (ETRO), Vrije Universiteit Brussel (VUB), Brussels, Belgium
*
Corresponding author: Ionut Schiopu, E-mail: ischiopu@etrovub.be

Abstract

This paper proposes a novel approach for lossless coding of light field (LF) images based on a macro-pixel (MP) synthesis technique which synthesizes the entire LF image in one step. The reference views used in the synthesis process are selected based on four different view configurations and define the reference LF image. This image is stored as an array of reference MPs which collect one pixel from each reference view, being losslessly encoded as a base layer. A first contribution focuses on a novel network design for view synthesis which synthesizes the entire LF image as an array of synthesized MPs. A second contribution proposes a network model for coding which computes the MP prediction used for lossless encoding of the remaining views as an enhancement layer. Synthesis results show an average distortion of 29.82 dB based on four reference views and up to 36.19 dB based on 25 reference views. Compression results show an average improvement of 29.9% over the traditional lossless image codecs and 9.1% over the state-of-the-art.

Information

Type
Original Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © The Authors, 2019
Figure 0

Fig. 1. The proposed method.

Figure 1

Fig. 2. (B1) The four view configurations used to form the reference macro-pixel (RMP): the $2\times 2$ configuration selects RMPs of size $2\times 2$, the $3\times 3$ configuration selects RMPs of size $3\times 3$, the $4\times 4$ configuration selects RMPs of size $4\times 4$, the $5\times 5$ configuration selects RMPs of size $5\times 5$. (B2) The structure of the patch for synthesis of size $30\times 30$. In the case of $2\times 2$ configuration, the patch selects an array of $15\times 15$ RMPs around the current MP position. (B3) The structure of the patch for coding of size $30\times 60$ for the case of $2\times 2$ configuration. The patch collects six MPs in the causal neighborhood and two synthesized macro-pixels (SMPs) in the non-causal neighborhood of the current MP position marked with a red square. Blue denotes the position of the reference view pixels in the MP. White denotes the position of the synthesized view pixels in the MP. Gray denotes the already encoded pixel positions.

Figure 2

Fig. 3. The proposed neighborhood for generating the binary mode context in the basic reference codec [6]. Blue denotes the position of the reference view pixels in the MP. White denotes the position of the synthesized view pixels in the MP. Gray denotes the already encoded pixel positions. The red square denotes the current pixel position. The purple squares denote the CALIC binary mode context. The orange squares denote the proposed binary mode context.

Figure 3

Fig. 4. (a) The proposed network design. When switch K is set to the (B2) Synthesis branch, the MP Syntheses based on Convolutional Neural Network (MPS-CNN) model is obtained. When switch K is set to the (B3) Coding branch, the Prediction using SMPs based on Convolutional Neural Network (PSMP-CNN) model is obtained. (b) The layer structure of the Convolution Block (CB). (c) The layer structure of the Deconvolution Block (DB). (d) The layer structure of the Convolution-based Processing Block (CPB) built based on the Residual Learning paradigma.

Figure 4

Fig. 5. The workflow of the proposed method.

Figure 5

Fig. 6. Pseudo-colored images of the mean absolute error computed over the color channels for one view in the LF images. The mean absolute errors with more than 4-bit representation (i.e., larger than 15) are replaced by the escape symbol 16. The top-to-bottom rows show the synthesized view $(7,\,7)$ for each of the four view configurations: $2\times 2$, $3\times 3$, $4\times 4$, and respectively $5\times 5$.

Figure 6

Fig. 7. Evaluation of the one-step synthesis results of the macro-pixel synthesis. (a) Synthesis results for each image in the Test Set. (b) Rate-distortion results for the Test Set. (c) Rate-distortion results for the Red and white building image. (d) Rate-distortion results for the Sophie and Vincent 2 image.

Figure 7

Table 1. Lossless compression results of RMP and one-step synthesis results of SMP

Figure 8

Fig. 8. Lossless compression results.

Figure 9

Table 2. Lossless compression results of for the Test Set (108 images)