Hostname: page-component-77f85d65b8-7lfxl Total loading time: 0 Render date: 2026-03-29T13:37:14.723Z Has data issue: false hasContentIssue false

Moving object detection in the H.264/AVC compressed domain

Published online by Cambridge University Press:  21 November 2016

Marcus Laumer*
Affiliation:
Multimedia Communications and Signal Processing, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
Peter Amon
Affiliation:
Sensing and Industrial Imaging, Siemens Corporate Technology, Munich, Germany
Andreas Hutter
Affiliation:
Sensing and Industrial Imaging, Siemens Corporate Technology, Munich, Germany
André Kaup
Affiliation:
Multimedia Communications and Signal Processing, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Erlangen, Germany
*
Corresponding author:Marcus Laumer Email: marcus.laumer@fau.de

Abstract

This paper presents a moving object detection algorithm for H.264/AVC video streams that is applied in the compressed domain. The method is able to extract and analyze several syntax elements from any H.264/AVC-compliant bit stream. The number of analyzed syntax elements depends on the mode in which the method operates. The algorithm is able to perform either a spatiotemporal analysis in a single step or a two-step analysis that starts with a spatial analysis of each frame, followed by a temporal analysis of several subsequent frames. Thereby, in each mode either only (sub-)macroblock types and partition modes or, additionally, quantization parameters are analyzed. The evaluation of these syntax elements enables the algorithm to determine a “weight” for each 4×4 block of pixels that indicates the level of motion within this block. A final segmentation after creating these weights segments each frame to foreground and background and hence indicates the positions and sizes of all moving objects. Our experiments show that the algorithm is able to efficiently detect moving objects in the compressed domain and that it is configurable to process a large number of parallel bit streams in real time.

Information

Type
Original Paper
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use or in order to create a derivative work.
Copyright
Copyright © The Authors, 2016
Figure 0

Fig. 1. Two video analysis approaches for different domains. Working in the compressed domain enables analysis algorithms to replace the video decoder and the frame buffer by a simple syntax parser. (a) Pixel domain video analysis. (b) Compressed domain video analysis.

Figure 1

Fig. 2. Structures defined by the H.264/AVC standard. A frame consists of disjoint macroblocks that have a size of 16×16 pixels. Several macroblocks can be grouped to disjoint slices. A macroblock can be divided into four sub-macroblocks with a size of 8×8 pixels. Macroblocks and sub-macroblocks can be partitioned into rectangles or squares for motion compensation.

Figure 2

Table 1. Categories and initial weights winit defined according to block types and partition modes of H.264/AVC. All weights depend on a base weight wbase.

Figure 3

Fig. 3. Illustration of the weighting process. The current block is marked in dark gray and all blocks considered during the calculation are marked in light gray. Numbers within blocks indicate initial weights winit. Constant values: np=ns=1, a=3.5, b=2.5, and wbase=1.

Figure 4

Fig. 4. ROC curves of CVLAB test sequences. (BP: Baseline profile; HP: High profile; QP: quantization parameter). (a) CVLAB, single-step mode, QP 30. (b) CVLAB, two-step mode, QP 30. (c) campus7-c1, single-step mode. (d) campus7-c1, two-step mode.

Figure 5

Table 2. Adjustable parameters and their possible values. Finally selected values are bold-faced. Source: [21].

Figure 6

Fig. 5. Precision, recall, and F2 score against QP for CVLAB test sequences. (BP: Baseline profile; HP: High profile; QP: quantization parameter). (a) Precision, single-step mode. (b) Precision, two-step mode. (c) Recall, single-step mode. (d) Recall, two-step mode. (e) F2 score, single-step mode. (f) F2 score, two-step mode.

Figure 7

Fig. 6. Precision, recall, and F2 score against QP for CDNET test sequences. (BP: Baseline profile; HP: High profile; QP: quantization parameter). (a) Precision, single-step mode. (b) Precision, two-step mode. (c) Recall, single-step mode. (d) Recall, two-step mode. (e) F2 score, single-step mode. (f) F2 score, two-step mode.

Figure 8

Table 3. Summary of detection results for CVLAB test sequences encoded with QP=30. Shown are the results of our algorithm in single- and two-step modes, respectively, and the results of the OpenCV [30] algorithms MOG and MOG2. The first column also indicates the frame size and the number of frames of the respective sequence.

Figure 9

Table 4. Summary of detection results for CDNET test sequences encoded with QP=30. Shown are the results of our algorithm in single- and two-step modes, respectively, and the results of the OpenCV [30] algorithms MOG and MOG2. The first column also indicates the frame size and the number of frames of the respective sequence.

Figure 10

Fig. 7. Sample segmentation of frame 135 of sequence campus7-c1, encoded with HP and QP=30. (Black pixels: TN, green pixels: TP, white pixels: FN, red pixels: FP). (a) Original. (b) Single-step mode. (c) Two-step mode. (d) MOG [31]. (e) MOG2 [32]

Figure 11

Fig. 8. Sample segmentation of frame 1858 of sequence backdoor, encoded with HP and QP=30. (Black pixels: TN, green pixels: TP, white pixels: FN, red pixels: FP). (a) Original. (b) Single-step mode. (c) Two-step mode. (d) MOG [31]. (e) MOG2 [32].

Figure 12

Fig. 9. Sample segmentation of some sequences with challenging conditions, encoded with HP and QP=30. The upper row shows the original frames, the lower row the results from our algorithm in two-step mode. (Black pixels: TN, green pixels: TP, white pixels: FN, red pixels: FP). (a) peopleInShade, frame 296. (b) pedestrians, frame 474. (c) skating, frame 902.

Figure 13

Fig. 10. Sample segmentation of frames 411, 522, 902, 1050, and 1126 of sequence PETS2006, encoded with HP and QP=30. The upper row shows the original frames, the lower row the results from our algorithm in two-step mode. (Black pixels: TN, green pixels: TP, white pixels: FN, red pixels: FP). (a) Frame 411. (b) Frame 522. (c) Frame 902. (d) Frame 1050. (e) Frame 1126.

Figure 14

Fig. 11. Sample segmentation of frame 935 of sequence laboratory4p-c0, encoded with HP and QP=30. (Black pixels: TN, green pixels: TP, white pixels: FN, red pixels: FP). (a) Original. (b) Single-step mode. (c) Two-step mode. (d) MOG [31]. (e) MOG2 [32].

Figure 15

Fig. 12. Sample segmentation of frame 615 of sequence terrace1-c0, encoded with HP and QP=30. (Black pixels: TN, green pixels: TP, white pixels: FN, red pixels: FP). (a) Original. (b) Single-step mode. (c) Two-step mode. (d) MOG [31]. (e) MOG2 [32].

Figure 16

Fig. 13. Sample segmentation of frame 2463 of sequence cubicle, encoded with HP and QP=30. (Black pixels: TN, green pixels: TP, white pixels: FN, red pixels: FP). (a) Original, (b) Single-step mode (c) Two-step mode (d) MOG [31] (e) MOG2 [32].

Figure 17

Fig. 14. Sample segmentation of frame 1051 of sequence busStation, encoded with HP and QP=30. (Black pixels: TN, green pixels: TP, white pixels: FN, red pixels: FP). (a) Original. (b) Single-step mode. (c) Two-step mode. (d) MOG [31]. (e) MOG2 [32].