Hostname: page-component-89b8bd64d-4ws75 Total loading time: 0 Render date: 2026-05-07T22:29:50.493Z Has data issue: false hasContentIssue false

A survey on compact features for visual content analysis

Published online by Cambridge University Press:  20 June 2016

Luca Baroffio
Affiliation:
Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy
Alessandro E. C. Redondi*
Affiliation:
Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy
Marco Tagliasacchi
Affiliation:
Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy
Stefano Tubaro
Affiliation:
Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy
*
Corresponding author:A.E.C. Redondialessandroenrico.redondi@polimi.it

Abstract

Visual features constitute compact yet effective representations of visual content, and are being exploited in a large number of heterogeneous applications, including augmented reality, image registration, content-based retrieval, and classification. Several visual content analysis applications are distributed over a network and require the transmission of visual data, either in the pixel or in the feature domain, to a central unit that performs the task at hand. Furthermore, large-scale applications need to store a database composed of up to billions of features and perform matching with low latency. In this context, several different implementations of feature extraction algorithms have been proposed over the last few years, with the aim of reducing computational complexity and memory footprint, while maintaining an adequate level of accuracy. Besides extraction, a large body of research addressed the problem of ad-hoc feature encoding methods, and a number of networking and transmission protocols enabling distributed visual content analysis have been proposed. In this survey, we present an overview of state-of-the-art methods for the extraction, encoding, and transmission of compact features for visual content analysis, thoroughly addressing each step of the pipeline and highlighting the peculiarities of the proposed methods.

Information

Type
Overview Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © The Authors, 2016
Figure 0

Fig. 1. Pipelines for the “Analyze-Then-Compress” and “Compress-Then-Analyze” paradigms.

Figure 1

Table 1. Summary of the methods presented in this survey.

Figure 2

Table 2. Overview of the most common local feature detectors.

Figure 3

Fig. 2. The key idea behind SUSAN. In flat regions (a), almost all the pixels have an intensity similar to that of the nucleous (white cross). In edge regions (c), approximately half of the pixels have an intensity similar to that of the nucleous. In corner regions (b), less than half of the pixels have an intensity similar to that of the nucleous.

Figure 4

Table 3. Overview of the most common local feature descriptors.

Figure 5

Fig. 3. SIFT descriptor building process. (Left) Local gradients are computed and pooled on a 16×16 grid around the keypoint (shown as 8×8 here for simplicity). (Right) for each cell of the overlying 4×4 grid a 8D weighted histogram of gradients is computed.

Figure 6

Fig. 4. (a) BRISK and (b) FREAK patterns of pixel locations (in red) used to perform pairwise intensity comparisons. Blue circles corresponds to Gaussian kernel used to smooth local pixel intensities.

Figure 7

Fig. 5. The best 16 pairwise smoothed intensity comparisons learned by BAMBOO, exploiting a dictionary of box- and Haar-like filters.

Figure 8

Table 4. MAP on Oxford, Turin, and Zurich Building dataset.

Figure 9

Table 5. Homography estimation precision on the Visual Tracking Dataset [106].

Figure 10

Table 6. Average amount of time required to compute 500 local descriptors.

Figure 11

Table 7. Overview of visual feature coding methods.

Figure 12

Fig. 6. (a) “Bag-of-Features” assigns the input feature (green triangle) to its nearest visual word (b) Sparse Coding approximates the input feature as a combination of few words (c) Locality-constrained Linear Coding constraints the visual words composing the sparse combination to be near to the input feature in the descriptor space.

Figure 13

Table 8. Image classification accuracy achieved by global feature encoding algorithms as reported in [15].