Hostname: page-component-89b8bd64d-5bvrz Total loading time: 0 Render date: 2026-05-07T13:23:52.879Z Has data issue: false hasContentIssue false

A comprehensive study of the rate-distortion performance in MPEG point cloud compression

Published online by Cambridge University Press:  12 November 2019

Evangelos Alexiou*
Affiliation:
Multimedia Signal Processing Group, École Polytechnique Fédérale de Lausanne, Switzerland
Irene Viola
Affiliation:
Multimedia Signal Processing Group, École Polytechnique Fédérale de Lausanne, Switzerland
Tomás M. Borges
Affiliation:
Electrical Engineering Department, Universidade de Brasília, Brazil
Tiago A. Fonseca
Affiliation:
Gama Engineering College, Universidade de Brasília, Brazil
Ricardo L. de Queiroz
Affiliation:
Computer Science Department, Universidade de Brasília, Brazil
Touradj Ebrahimi
Affiliation:
Multimedia Signal Processing Group, École Polytechnique Fédérale de Lausanne, Switzerland
*
Corresponding author: Evangelos Alexiou Email: evangelos.alexiou@epfl.ch

Abstract

Recent trends in multimedia technologies indicate the need for richer imaging modalities to increase user engagement with the content. Among other alternatives, point clouds denote a viable solution that offers an immersive content representation, as witnessed by current activities in JPEG and MPEG standardization committees. As a result of such efforts, MPEG is at the final stages of drafting an emerging standard for point cloud compression, which we consider as the state-of-the-art. In this study, the entire set of encoders that have been developed in the MPEG committee are assessed through an extensive and rigorous analysis of quality. We initially focus on the assessment of encoding configurations that have been defined by experts in MPEG for their core experiments. Then, two additional experiments are designed and carried to address some of the identified limitations of current approach. As part of the study, state-of-the-art objective quality metrics are benchmarked to assess their capability to predict visual quality of point clouds under a wide range of radically different compression artifacts. To carry the subjective evaluation experiments, a web-based renderer is developed and described. The subjective and objective quality scores along with the rendering software are made publicly available, to facilitate and promote research on the field.

Information

Type
Original Paper
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use or in order to create a derivative work.
Copyright
Copyright © The Authors, 2019
Figure 0

Table 1. Experimental set-ups. Notice that single and double stand for the number of stimuli visualized to rate a model. Moreover, sim. and seq. denote simultaneous and sequential assessment, respectively. Finally, incl. zoom indicates varying camera distance to acquire views of the model.

Figure 1

Fig. 1. Reference point cloud models. The set of objects is presented in the first row, whilst the set of human figures is illustrated in the second row. (a) amphoriskos, (b) biplane, (c) head, (d) romanoillamp, (e) longdress, (f) loot, (g) redandblack, (h) soldier, (i) the20smaria.

Figure 2

Table 2. Summary of content retrieval information, processing, and point specifications.

Figure 3

Fig. 2. V-PCC compression process. In (a), the original point cloud is decomposed into geometry video, texture video, and metadata. Both video contents are smoothed by Padding in (b) to allow for the best HEVC [58] performance. The compressed bitstreams (metadata, geometry video, and texture video) are packed into a single bitstream: the compressed point cloud.

Figure 4

Fig. 3. Overview of G-PCC geometry encoder. After voxelization, the geometry is encoded either by Octree or by TriSoup modules, which depends on Octree.

Figure 5

Fig. 4. Overview of G-PCC color attribute encoder. In the scope of this work, either RAHT or Lifting is used to encode contents under test.

Figure 6

Fig. 5. Illustration of artifacts occurred after encoding the content amphoriskos with the codecs under evaluation. To obtain comparable visual quality, different degradation levels are selected for V-PCC and G-PCC variants. (a) Reference. (b) V-PCC, Degradation level = R1. (c) Octree-Lifting, Degradation level = R3. (d) Octree-RAHT, Degradation level = R3. (e) TriSoup-Lifting, Degradation level = R3. (f) TriSoup-RAHT, Degradation level = R3.

Figure 7

Fig. 6. Illustration of the evaluation platform. Both reference and distorted models are presented side-by-side while being clearly remarked. Users' judgments can be submitted through the rating panel. The green bar at the bottom indicates the progress in the current batch.

Figure 8

Fig. 7. Scatter plots indicating correlation between subjective scores from the participating laboratories. (a) EPFL scores as ground truth. (b) UNB scores as ground truth.

Figure 9

Table 3. Performance indexes depicting the correlation between subjective scores from the participating laboratories.

Figure 10

Fig. 8. MOS versus SOS fitting for scores obtained in EPFL and UNB, with relative SOS coefficient a. The shaded plot indicates the $95\%$ confidence bounds for both fittings.

Figure 11

Fig. 9. MOS against degradation levels defined for each codec, grouped per content under evaluation. In the first row, the results for point clouds representing objects are provided, whereas in the second row, curves for the human figure contents are illustrated. (a) amphoriskos, (b) biplane, (c) head, (d) romanoillamp, (e) loot, (f) longdress, (g) soldier, (h) the20smaria.

Figure 12

Fig. 10. Soldier encoded with V-PCC. Although the R4 degraded version is blurrier with respect to R5, missing points in the latter model were rated as more annoying (examples are highlighted in the figures). (a) Degradation level = R4. (b) Degradation level = R5.

Figure 13

Fig. 11. Biplane encoded with V-PCC. The color smoothing resulting from the low-pass filtering in texture leads to less annoying artifacts for R2 with respect to R3. (a) Degradation level = R2. (b) Degradation level = R3.

Figure 14

Table 4. Results of the Welch's t-test performed on the scores associated with color encoding module Lifting and RAHT, for geometry encoder Octree and TriSoup and for every degradation level. The number indicates the ratio of contents for which the color encoding module of each row is significantly better than the module of each column.

Figure 15

Table 5. Performance indexes computed on the entire dataset. The best index across a metric is indicated with bold text, for each regression model.

Figure 16

Fig. 12. Scatter plots of subjective against objective quality scores for the best-performing objective metric, among all regression models. (a) Performance across the entire range of objective scores. (b) Performance in a region of lower degradation levels.

Figure 17

Fig. 13. Scatter plots of subjective against objective quality scores for the best-performing objective metric, for the majority of regression models. (a) Performance of the best point-based quality metric. (b) Performance of the best projection-based quality metric.

Figure 18

Table 6. Selected encoding parameters of G-PCC for experiment 2, for high and low target bit rates. The depth parameter indicates the resolution of the Octree structure, whereas the level parameter indicates the TriSoup approximation.

Figure 19

Fig. 14. Preference and tie probabilities for each pair of configurations under test in experiment 2, for the high bit rate case. The color blue (yellow) of the bar indicates the probability of the configuration on the left (right) side being preferred over the one on the right (left) side. The orange bar indicates the tie probability. (a) amphoriskos. (b) biplane. (c) longdress. (d) loot. (e) the20smaria.

Figure 20

Fig. 15. Preference and tie probabilities for each pair of configurations under test in experiment 2, for the low bit rate case. The color blue (yellow) of the bar indicates the probability of the configuration on the left (right) side being preferred over the one on the right (left) side. The orange bar indicates the tie probability. (a) amphoriskos. (b) biplane. (c) longdress. (d) loot. (e) the20smaria.

Figure 21

Fig. 16. Normalized MOS and relative CIs obtained from the winning frequencies gathered in experiment 2, for each configuration, averaged across the contents, separately for high and low target bit rates.

Figure 22

Table 7. Selected encoding parameters of G-PCC for experiment 3, for high and low target bit rates. The depth parameter indicates the resolution of the Octree structure, whereas the QP parameter indicates the quantization parameter for the Lifting encoding module.

Figure 23

Fig. 17. Preference and tie probabilities for each pair of configurations under test in experiment 3, for the high bit rate case. The color blue (yellow) of the bar indicates the probability of the configuration on the left (right) side being preferred over the one on the right (left) side. The orange bar indicates the tie probability. (a) amphoriskos. (b) biplane. (c) longdress. (d) loot. (e) the20smaria.

Figure 24

Fig. 18. Preference and tie probabilities for each pair of configurations under test in experiment 3, for the low bit rate case. The color blue (yellow) of the bar indicates the probability of the configuration on the left (right) side being preferred over the one on the right (left) side. The orange bar indicates the tie probability. (a) amphoriskos. (b) biplane. (c) longdress. (d) loot. (e) the20smaria.

Figure 25

Fig. 19. Normalized MOS and relative CIs obtained from the winning frequencies gathered in experiment 3, for each configuration, averaged across the contents, separately for high and low target bit rates.