Hostname: page-component-6766d58669-zlvph Total loading time: 0 Render date: 2026-05-15T08:05:51.692Z Has data issue: false hasContentIssue false

Dimensionality reduction of visual features for efficient retrieval and classification

Published online by Cambridge University Press:  12 July 2016

Petros T. Boufounos
Affiliation:
MERL – Mitsubishi Electric Research Laboratories, Cambridge, MA 02139, USA
Hassan Mansour
Affiliation:
MERL – Mitsubishi Electric Research Laboratories, Cambridge, MA 02139, USA
Shantanu Rane
Affiliation:
Palo Alto Research Center (PARC), Palo Alto, CA 94304, USA
Anthony Vetro*
Affiliation:
MERL – Mitsubishi Electric Research Laboratories, Cambridge, MA 02139, USA
*
Corresponding author:A. Vetro Email: avetro@merl.com

Abstract

Visual retrieval and classification are of growing importance for a number of applications, including surveillance, automotive, as well as web and mobile search. To facilitate these processes, features are often computed from images to extract discriminative aspects of the scene, such as structure, texture or color information. Ideally, these features would be robust to changes in perspective, illumination, and other transformations. This paper examines two approaches that employ dimensionality reduction for fast and accurate matching of visual features while also being bandwidth-efficient, scalable, and parallelizable. We focus on two classes of techniques to illustrate the benefits of dimensionality reduction in the context of various industrial applications. The first method is referred to as quantized embeddings, which generates a distance-preserving feature vector with low rate. The second method is a low-rank matrix factorization applied to a sequence of visual features, which exploits the temporal redundancy among feature vectors associated with each frame in a video. Both methods discussed in this paper are also universal in that they do not require prior assumptions about the statistical properties of the signals in the database or the query. Furthermore, they enable the system designer to navigate a rate versus performance trade-off similar to the rate-distortion trade-off in conventional compression.

Information

Type
Industrial Technology Advances
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © The Authors, 2016
Figure 0

Fig. 1. The JL lemma guarantees the existence of an embedding that preserves pairwise Euclidean distances.

Figure 1

Fig. 2. (a) A quantized embedding is derived by first obtaining a JL embedding by multiplying the vectors in the canonical feature space by a random matrix, followed by scalar quantization of each element in the vector of randomized measurements.

Figure 2

Fig. 3. Reducing the bits per dimension increases the quantization error (red curve), but allows more dimensions, thereby reducing the embedding error (blue curve). The total error plot (black curve) suggests an optimal tradeoff between the number of dimensions and the number of bits allocated per dimension.

Figure 3

Fig. 4. “Unary” expansion of an integer vector to preserve ℓ1 distances. Each element of the original vector is expanded to V bits, such that if the coefficient value is ui, the first ui bits are set to 1 and the next Vui bits are set to zero. The ℓ2 distance between expanded vectors equals the ℓ1 distance between the original vectors. This requires that ui is bounded by V. Thus, a d-dimensional vector is expanded to dV dimensions.

Figure 4

Fig. 5. (a) Conventional 3-bit (eight levels) scalar quantizer with saturation level S=4Δ. (b) Universal scalar quantizer. (c) The embedding map g(d) for JL-based embeddings (blue) and for universal embeddings (red).

Figure 5

Fig. 6. (a) Multi-bit quantization with fewer random projections outperforms LSH-based schemes [28,29], which employ 1-bit quantization with a large number of random projections. (b) When the bit budget allocated to each descriptor (vector) is fixed, the best retrieval performance is achieved with 3 and 4-bit quantizations.

Figure 6

Fig. 7. (a) Universal embedding performance at different bit rates, as a function of Δ (b) Performance of properly tuned universal embeddings (UE) as a function of the rate, compared with the conventionally quantized JL embeddings (QJL) also shown in Fig. 6(b).

Figure 7

Fig. 8. Classification accuracy as a function of the bit-rate achieved using: (a) quantized JL (QJL) embeddings; (b) the universal embeddings; and (c) classification accuracy as a function of the quantization step size Δ used in computing the universal embeddings.

Figure 8

Fig. 9. Example of extracting SIFT features from a video scene and computing the compact descriptor L along with the binary selection matrix R.

Figure 9

Table 1. Compression ratio of a rank r=30 compact descriptor.

Figure 10

Fig. 10. (a) Video scene classification accuracy using ONMF, sparse NMF, and k-means clustering for varying rank and number of clusters. (b) Classification accuracy after removing from the video database the GOPs that are temporally adjacent to the query GOP.

Figure 11

Fig. 11. Video scene classification accuracy using ONMF for varying rank and GOP sizes.

Figure 12

Table 2. Compression ratio versus GOP size and rank.