Hostname: page-component-89b8bd64d-7zcd7 Total loading time: 0 Render date: 2026-05-13T06:36:49.948Z Has data issue: false hasContentIssue false

Robust and efficient content-based music retrieval system

Published online by Cambridge University Press:  28 March 2016

Yuan-Shan Lee
Affiliation:
Department of Computer Science and Information Engineering, National Central University, Jhongli, Taiwan
Yen-Lin Chiang
Affiliation:
Department of Computer Science and Information Engineering, National Central University, Jhongli, Taiwan
Pei-Rung Lin
Affiliation:
Department of Computer Science and Information Engineering, National Central University, Jhongli, Taiwan
Chang-Hung Lin
Affiliation:
Department of Computer Science and Information Engineering, National Central University, Jhongli, Taiwan
Tzu-Chiang Tai*
Affiliation:
Department of Computer Science and Information Engineering, Providence University, Taichung, Taiwan
*
Corresponding author:Tzu-Chiang Tai Email: tctai717@gmail.com

Abstract

This work proposes a query-by-singing (QBS) content-based music retrieval (CBMR) system that uses Approximate Karbunen–Loeve transform for noise reduction. The proposed QBS-CBMR system uses a music clip as a search key. First, a 51-dimensional matrix containing 39-Mel-frequency cepstral coefficients (MFCCs) features and 12-Chroma features are extracted from an input music clip. Next, adapted symbolic aggregate approximation (adapted SAX) is used to transform each dimension of features into a symbolic sequence. Each symbolic sequence corresponding to each dimension of MFCCs is then converted into a structure called advanced fast pattern index (AFPI) tree. The similarity between the query music clip and the songs in the database is evaluated by calculating a partial score for each AFPI tree. The final score is obtained by calculating the weighted sum of all partial scores, where the weighting of each partial score is determined by its entropy. Experimental results show that the proposed music retrieval system performs robustly and accurately with the entropy weighting mechanism.

Information

Type
Original Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © The Authors, 2016
Figure 0

Fig. 1. The main structure of t he proposed music retrieval system.

Figure 1

Fig. 2. A diagram for the music retrieving process in our system.

Figure 2

Fig. 3. Shepard helix of pitch perception. The vertical dimension is tone height, and the angular dimension is chroma.

Figure 3

Fig. 4. A 128-dimensional time series vector is reduced to an 8-dimensional vector of PAA representation [11].

Figure 4

Fig. 5. An illustration of a symbolic sequence. The PAA representation shown in Fig. 4 is converted into a symbolic sequence of three distinct symbols: a, b, c, via the original SAX method.

Figure 5

Fig. 6. The pdf curves for the standard Cauchy and the standard Gaussian distributions. The curve exactly on the solid-colored area is the Cauchy distribution curve.

Figure 6

Fig. 7. An example of a cumulative distribution function (CDF) P of the time series variable x. When n=5, the breakpoints are located in the positions of P(1/5), P(2/5), P(3/5), and P(4/5), respectively. To find x from P(x), the inverse function of a CDF is required.

Figure 7

Fig. 8. Example of AFPI tree structure, where n=6 and K=3. This figure is redrafted from [2].

Figure 8

Table 1. Two examples of pattern relations.

Figure 9

Table 2. The common bits.

Figure 10

Fig. 9. The flowchart of AKLT.

Figure 11

Table 3. The example of how to calculate accuracy.

Figure 12

Fig. 10. Comparison of the proposed system and baseline system.

Figure 13

Fig. 11. Comparison between different dimensions of features.

Figure 14

Fig. 12. Comparison between noised music clips and enhanced music clips.