Hostname: page-component-89b8bd64d-x2lbr Total loading time: 0 Render date: 2026-05-10T12:51:37.002Z Has data issue: false hasContentIssue false

Free-viewpoint image synthesis using superpixel segmentation

Published online by Cambridge University Press:  13 June 2017

Mehrdad Panahpour Tehrani*
Affiliation:
Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan
Tomoyuki Tezuka
Affiliation:
KDDI Cooperation, Chiyoda, Tokyo, Japan
Kazuyoshi Suzuki
Affiliation:
Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan
Keita Takahashi
Affiliation:
Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan
Toshiaki Fujii
Affiliation:
Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8603, Japan
*
Corresponding author: M. Panahpour Tehrani Email: panahpour@nuee.nagoya-u.ac.jp

Abstract

A free-viewpoint image can be synthesized using color and depth maps of reference viewpoints, via depth-image-based rendering (DIBR). In this process, three-dimensional (3D) warping is generally used. A 3D warped image consists of disocclusion holes with missing pixels that correspond to occluded regions in the reference images, and non-disocclusion holes due to limited sampling density of the reference images. The non-disocclusion holes are those among scattered pixels of a same region or object. These holes are larger when the reference viewpoints and the free viewpoint images have a larger physical distance. Filling these holes has a crucial impact on the quality of free-viewpoint image. In this paper, we focus on free-viewpoint image synthesis that is precisely capable of filling the non-disocclusion holes caused by limited sampling density, using superpixel segmentation. In this approach, we proposed two criteria for segmenting depth and color data of each reference viewpoint. By these criteria, we can detect which neighboring pixels should be connected or kept isolated in each references image, before being warped. Polygons enclosed by the connected pixels, i.e. superpixel, are inpainted by k-means interpolation. Our superpixel approach has a high accuracy since we use both color and depth data to detect superpixels at the location of the reference viewpoint. Therefore, once a reference image that consists of superpixels is 3D warped to a virtual viewpoint, the non-disocclusion holes are significantly reduced. Experimental results verify the advantage of our approach and demonstrate high quality of synthesized image when the virtual viewpoint is physically far from the reference viewpoints.

Information

Type
Original Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © The Authors, 2017
Figure 0

Fig. 1. Examples of synthesized virtual viewpoints. From left to right: input image + depth data (“breakdancers”, top: camera 4, bottom: camera 5), naïve 3D warping [10], area highlighted with yellow circle shows the holes with limited sampling density, and area highlighted with red circle is disocclusion hole (right: close-up), MPEG view synthesis reference software, VSRS 4.0 [9] (right: close-up), and proposed method (right: close-up).

Figure 1

Fig. 2. Overview of our proposed free-viewpoint image synthesis method.

Figure 2

Fig. 3. 3D warping using our proposed method in comparison with other methods.

Figure 3

Fig. 4. Examples of input view-plus-depth-data set used through experiments.

Figure 4

Table 1. Specification of test sequences used in the experiments [36, 37].

Figure 5

Fig. 5. PSNR with different baselines and Z = 0 (left: “bee”, right: “shark”). Camera configuration for this experiment is shown next to the graphs.

Figure 6

Fig. 6. Results of free-viewpoint image generated at edge area (top: baseline 7 mm, bottom: baseline 379 mm). From left to right: ground truth, naïve 3D warping, MPEG VSRS, and proposed method. The camera configuration is similar to the configuration depicted in Fig. 7.

Figure 7

Fig. 7. Ground truth, the reference viewpoint (cameras) and camera configuration used in the experiments of Fig. 8.

Figure 8

Fig. 8. Subjective results of free views synthesized with each method. Demo videos for these sequences are in Supplementary Videos 1 and 2. The reference viewpoints used in this experiment are shown in Fig. 4. The unit is mm.

Figure 9

Fig. 9. Camera configuration and input reference view + depth data (top, from left to right: “Kendo”, “Balloons”, “Champagne Tower”, and “Breakdancers”), and examples of free-viewpoint images synthesized with each method (bottom, from left to right: naïve 3D warping, MPEG VSRS 4.0, and proposed method). Demo video for Breakdancers (Supplementary Video 3).

Figure 10

Fig. 10. PSNR using different number of cameras (left: “Bee”, right: “Shark”). The values in the legend shows the baseline distance between each reference viewpoint images.

Figure 11

Fig. 11. Synthesized results of using different number of cameras (Top: “Bee”, baseline 45 mm, bottom: “Shark”, baseline 84 mm).

Figure 12

Fig. 12. Synthesized results of using different number of cameras (“Bee”, baseline 19 mm).

Tehrani supplementary materials S1

Supplementary Video

Download Tehrani supplementary materials S1(Video)
Video 27.4 MB

Tehrani supplementary materials S2

Supplementary Video

Download Tehrani supplementary materials S2(Video)
Video 26.9 MB

Tehrani supplementary materials S3

Supplementary Video

Download Tehrani supplementary materials S3(Video)
Video 31 MB