Hostname: page-component-6766d58669-zlvph Total loading time: 0 Render date: 2026-05-18T09:19:18.391Z Has data issue: false hasContentIssue false

Mesh-based piecewise planar motion compensation and optical flow clustering for ROI coding

Published online by Cambridge University Press:  02 October 2015

Holger Meuel*
Affiliation:
Institut für Informationsverarbeitung (TNT), Gottfried Wilhelm Leibniz Universität Hannover, Hannover, Germany. Phone: +49 511 762-19585
Marco Munderloh
Affiliation:
Institut für Informationsverarbeitung (TNT), Gottfried Wilhelm Leibniz Universität Hannover, Hannover, Germany. Phone: +49 511 762-19585
Matthias Reso
Affiliation:
Institut für Informationsverarbeitung (TNT), Gottfried Wilhelm Leibniz Universität Hannover, Hannover, Germany. Phone: +49 511 762-19585
Jörn Ostermann
Affiliation:
Institut für Informationsverarbeitung (TNT), Gottfried Wilhelm Leibniz Universität Hannover, Hannover, Germany. Phone: +49 511 762-19585
*
Corresponding author: H. Meuel Email: meuel@tnt.uni-hannover.de

Abstract

For the transmission of aerial surveillance videos taken from unmanned aerial vehicles (UAVs), region of interest (ROI)-based coding systems are of growing interest in order to cope with the limited channel capacities available. We present a fully automatic detection and coding system which is capable of transmitting high-resolution aerial surveillance videos at very low bit rates. Our coding system is based on the transmission of ROI areas only. We assume two different kinds of ROIs: in order to limit the transmission bit rate while simultaneously retaining a high-quality view of the ground, we only transmit new emerging areas (ROI-NA) for each frame instead of the entire frame. At the decoder side, the surface of the earth is reconstructed from transmitted ROI-NA by means of global motion compensation (GMC). In order to retain the movement of moving objects not conforming with the motion of the ground (like moving cars and their previously occluded ground), we additionally consider regions containing such objects as interesting (ROI-MO). Finally, both ROIs are used as input to an externally controlled video encoder. While we use GMC for the reconstruction of the ground from ROI-NA, we use meshed-based motion compensation in order to generate the pelwise difference in the luminance channel (difference image) between the mesh-based motion compensated and the current input image to detect the ROI-MO. High spots of energy within this difference image are used as seeds to select corresponding superpixels from an independent (temporally consistent) superpixel segmentation of the input image in order to obtain accurate shape information of ROI-MO. For a false positive detection rate (regions falsely classified as containing local motion) of less than 2% we detect more than 97% true positives (correctly detected ROI-MOs) in challenging scenarios. Furthermore, we propose to use a modified high-efficiency video coding (HEVC) video encoder. Retaining full HDTV video resolution at 30 fps and subjectively high quality we achieve bit rates of about 0.6–0.9 Mbit/s, which is a bit rate saving of about 90% compared to an unmodified HEVC encoder.

Information

Type
Original Paper
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use or in order to create a derivative work.
Copyright
Copyright © The Authors, 2015
Figure 0

Fig. 1. Block diagram of ROI detection and coding system: Bold framed block: proposed cluster filter to eliminate false positive (FP) detections; white: optical flow; yellow: mesh-based motion estimation/compensation incl. ROI detector; magenta: superpixel segmentation and selection; green: global motion estimation and new area detector; brown: block generation, video coder and muxing (based on [34]).

Figure 1

Fig. 2. Original outtake (a) and reconstructed image after ROI encoding and decoding (c) with inaccurate MO detection due to homogeneous, unstructured regions on the car roof. Missing detections (b) of the rear part of the red car as ROI lead to reconstruction errors since the front part of the car (ROI) does not match the reconstructed background [15].

Figure 2

Fig. 3. Coding mask generation for new area (top row) and MOs. The MO activation mask from the difference image calculation is overlaid with an independent Superpixel segmentation in order to get accurate shape information of the MOs. The Coding mask is adapted to a coding block pattern (MO block coding mask) and combined with the NA block coding mask to the Final block coding mask. Cyan and green blocks in the latter will be encoded as ROI.

Figure 3

Fig. 4. Temporally consistent superpixels (TCSs) are used to bridge false negative detections of the ROI-MO: if no MO (white car) is detected by the MO detector (cyan), the MO in frame k−1 would not be selected for coding. Due to the temporal consistency of the superpixels the position of the car can also be predicted in frame k−1 and thus correct processing and transmission of the car in all frames can be guaranteed.

Figure 4

Fig. 5. Triangulated mesh (green triangles) between detected features (brown dots: background features, blue crosses: motion candidates including outlier, purple and white dots: detected MOs after cluster filtering) and trajectories (yellow lines) in the motion compensated destination frame [34]. Best viewed in color.

Figure 5

Fig. 6. The Delaunay triangulation of the feature point cloud in frame k creates the mesh. The triangulation is performed in the frame k (right). The displacement vectors point to the frame k−1 (left) and define the mesh in the frame k−1 (based on [36]).

Figure 6

Fig. 7. Example frames of the test set used to evaluate the MO detection and ROI coding framework.(Test Set 1). (a) MOs (black and red car with shadows) in the 750 m sequence, HDTV resolution, ground resolution: 21 pel/m [34,39]. (b) MO (white car in the middle) in the 350 m sequence, HDTV resolution, ground resolution: 43 pel/m [34,39]. (c) MOs (white and red car in the middle) in the VIRAT test data set, original resolution: 720×480, interlaced [37,38].

Figure 7

Fig. 8. Test sequences (self-recorded) for coding (Test Set 2) [39]. (a) Frame of the 350 m sequence, HDTV resolution, ground resolution: 43 pel/m. (b) Frame of the 500 m sequence, HDTV resolution, ground resolution: 30 pel/m. (c) Frame of the 1000 m sequence, HDTV resolution, ground resolution: 15 pel/m. (d) Frame of the 1500 m sequence, HDTV resolution, ground resolution: 10 pel/m.

Figure 8

Fig. 9. MO detections (d, e) and coding masks including superpixel (SP) enhancement (f, g) for the GMC-based (d, f) and the CF-based (e, g) MO detector. Panel (a) shows the original frame and (b,c) the decoded result [34]. (a) Original frame (cropped). (b) Decoded (CF+Mesh+SP, cropped). (c) Decoded (CF+Mesh+SP, whole frame). (d) GMC activation mask. (e) CF activation mask. (f) GMC+SP coding mask. (g) CF+SP coding mask.

Figure 9

Fig. 10. Receiver Operating Characteristics (ROCs) for (a) 750 m sequence, (b) 350 m sequence, (c) VIRAT test scene (TP rate calculated pel-wise, SP=Superpixel, CF=Cluster Filter, SWW=Sliding Window Width, N=No. of superpixels used for image segmentation, Difference Image only is 0×Dilation (not in the figures): for 750 m sequence: TP=7%, FP=0%; for 350 m sequence: TP=36.4%, FP=0%; for VIRAT: TP=6%, FP=0%).

Figure 10

Fig. 11. Subjective image quality comparison for different video codecs and different very low bit rates (350 m sequence, 150–500 kbit/s). (f) ROI HEVC is proposed. Best viewed in pdf. (a) Original frame (whole frame). (b) ROI HEVC en- and decoded (whole frame, 300 kbit/s). (c) Original. (d) AVC 500 kbit/s. (e) HEVC 150 kbit/s. (f) ROI HEVC 150 kbit/s.

Figure 11

Table 1. Configuration settings of the HEVC encoder for Low Delay (LD), Low Delay-P (LD-P) and Random Access (RA), based on the common HM configuration files encoder_lowdelay_main.cfg, encoder_lowdelay_P_main.cfg and encoder_randomaccess_main.cfg.

Figure 12

Fig. 12. RD diagrams for two test sequences from Test Set 1 (stars: AVC/AVC-skip, squares: HEVC/HEVC-skip; black: common, unmodified encoders, green: RA-alike, blue: LD-alike, maximum block sizes: 16×16 each, minimum block size: 4×4 each), red arrows emphasize bit rate saving of HEVC-skip system compared to unmodified HEVC coding. (a) 750 m sequence. (b) 350 m sequence.

Figure 13

Table 2. Bjøntegaard delta (BD, BD rate, cubic, QP range: 24–35 and BD-PSNR) [56,57] for Test Set 1, negative BD-rate numbers represent coding gains of the proposed HEVC-skip coding system over the AVC-skip ROI coding system. “All” represents total gains over the entire sequence, whereas “Inter only” represents BD gains only for inter predicted frames, based on 16×16 (CTU16) and 64×64 (CTU64) largest coding block size for HEVC.

Figure 14

Table 3. Coding gains (negative numbers) for Test Set 2 of proposed HEVC-based over AVC-based ROI coding system compared to the reference (Ref.) as marked in the table column by column. AVC and HEVC bit rates without ROI coding are additionally given (LD configurations-based with modified block-size according to the table, minimum MB/CU size=4×4).

Figure 15

Table 4. Run-times per frame, non-optimized components written in C/C++, single thread execution on CPU (no GPU or other hardware acceleration), PC with an Intel Core i7-3770K CPU, clock rate of 3.5 GHz.