Hostname: page-component-89b8bd64d-4ws75 Total loading time: 0 Render date: 2026-05-11T21:43:46.888Z Has data issue: false hasContentIssue false

Video coding of dynamic 3D point cloud data

Published online by Cambridge University Press:  20 December 2019

Sebastian Schwarz*
Affiliation:
Nokia Technologies, Hatanpään Valtatie 30, 33100Tampere, Finland
Nahid Sheikhipour
Affiliation:
Nokia Technologies, Hatanpään Valtatie 30, 33100Tampere, Finland Tampere University of Technology, Korkeakoulunkatu 6, 33720Tampere, Finland
Vida Fakour Sevom
Affiliation:
Tampere University of Technology, Korkeakoulunkatu 6, 33720Tampere, Finland
Miska M. Hannuksela
Affiliation:
Nokia Technologies, Hatanpään Valtatie 30, 33100Tampere, Finland
*
Corresponding authors: Sebastian Schwarz. E-mail: sebastian.schwarz@nokia.com

Abstract

Due to the increased popularity of augmented (AR) and virtual (VR) reality experiences, the interest in representing the real world in an immersive fashion has never been higher. Distributing such representations enables users all over the world to freely navigate in never seen before media experiences. Unfortunately, such representations require a large amount of data, not feasible for transmission on today's networks. Thus, efficient compression technologies are in high demand. This paper proposes an approach to compress 3D video data utilizing 2D video coding technology. The proposed solution was developed to address the needs of “tele-immersive” applications, such as VR, AR, or mixed reality with “Six Degrees of Freedom” capabilities. Volumetric video data is projected on 2D image planes and compressed using standard 2D video coding solutions. A key benefit of this approach is its compatibility with readily available 2D video coding infrastructure. Furthermore, objective and subjective evaluation shows significant improvement in coding efficiency over reference technology. The proposed solution was contributed and evaluated in international standardization. Although it is was not selected as the winning proposal, as very similar solution has been selected developed since then.

Information

Type
Original Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © The Authors, 2019
Figure 0

Fig. 1. Example of tele-immersive experience.

Figure 1

Fig. 2. Examples of 3D-to-2D projection for sequence Longdress provided by 8i [17]. (a) Example of 3D-to-2D projection onto a cylinder and (b) example of 3D-to-2D projection onto four rectangular planes for sequence.

Figure 2

Fig. 3. Projection-based volumetric video coding workflow.

Figure 3

Fig. 4. Projected (a) texture and (b) geometry images for (c) original point cloud (left), as well as decoded point clouds at 13 MBit/s for the proposed solution (middle) and reference technology [2] (right).

Figure 4

Fig. 5. Examples of projection-related artefacts: occlusions in the projected images lead to holes in the reconstructed point cloud (left), video coding distortion might lead to invalid points (right).

Figure 5

Fig. 6. Texture (top) and geometry (bottom) images for the first frame of Longdress, covered by eight rotations of 90$^{\circ }$ with a 45$^{\circ }$ offset and sequential decimation after four rotations.

Figure 6

Fig. 7. Improved occlusion handling by sequential decimation (left) versus without (right).

Figure 7

Fig. 8. Texture (left) and occupancy (right) images for the first frame of Longdress using V-PCC.

Figure 8

Table 1. Bjontegaard-delta bit rate (BD-BR) results (Random Access).

Figure 9

Table 2. Adapted BD-BR results and BD-PSNR results [25].

Figure 10

Fig. 9. Subjective MOS scores for sequences RedAndBlack (left), Soldier (middle), and Longdress (right) [26], where “reference” denotes [2] and “uncompressed” the original point cloud data.

Figure 11

Fig. 10. Example views rendered from all bit streams submitted for subjective evaluation. Top to bottom: RedAndBlack, Soldier, and Longdress. Left to right: Rate R01, R02, R03, and R04.

Figure 12

Fig. 11. Objective rate-distortion curves for all test sequences. (a) Objective rate-distortion curves for test sequence Queen, (b) objective rate-distortion curves for test sequence RedAndBlack, (c) objective rate-distortion curves for test sequence Loot, (d) objective rate-distortion curves for test sequence Soldier, and (e) Objective rate-distortion curves for test sequence Longdress.

Figure 13

Table 3. Objective quality without occlusion handling at low bit rates.

Figure 14

Table 4. Average coding run times in relation to anchor.

Figure 15

Table 5. Average coding run times in relation to anchor (all intra).

Figure 16

Fig. 12. Progress of V-PCC coding performance since CfP evaluation.