Hostname: page-component-77f85d65b8-6c7dr Total loading time: 0 Render date: 2026-03-28T04:43:16.094Z Has data issue: false hasContentIssue false

Progress in LPC-based frequency-domain audio coding

Published online by Cambridge University Press:  31 May 2016

Takehiro Moriya*
Affiliation:
Communication Science Laboratories, NTT, Atsugi, Japan
Ryosuke Sugiura
Affiliation:
Communication Science Laboratories, NTT, Atsugi, Japan
Yutaka Kamamoto
Affiliation:
Communication Science Laboratories, NTT, Atsugi, Japan
Hirokazu Kameoka
Affiliation:
Communication Science Laboratories, NTT, Atsugi, Japan
Noboru Harada
Affiliation:
Communication Science Laboratories, NTT, Atsugi, Japan
*
Corresponding author:T. Moriyat.moriya@m.ieice.org

Abstract

This paper describes the progress in frequency-domain linear prediction coding (LPC)-based audio coding schemes. Although LPC was originally used only for time-domain speech coders, it has been applied to frequency-domain coders since the late 1980s. With the progress in associated technologies, the frequency-domain LPC-based audio coding scheme has become more promising, and it has been used in speech/audio coding standards, such as MPEG-D unified speech and audio coding and 3GPP enhanced voice services since 2010. Three of the latest investigations on the representations of LPC envelopes in frequency-domain coders are shown. These are the harmonic model, frequency-resolution warping and the Powered All-Pole Spectral Envelope, all of which are aiming at further enhancement of the coding efficiency.

Information

Type
Overview Paper
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use or in order to create a derivative work.
Copyright
Copyright © The Authors, 2016
Figure 0

Fig. 1. Historical view of the progress in speech and audio coding schemes. Vertical axis roughly corresponds to bit rates. Arrows represent rough influence of fundamental technologies.

Figure 1

Table 1. Summary of new enhancement tools for LPC envelope.

Figure 2

Fig. 2. LPC synthesis for three types of coding schemes: lossless coder, high-compression waveform coder, and vocoder.

Figure 3

Fig. 3. Classification of coding schemes: (a) conventional audio encoder [upper branch]/decoder[lower branch], (b) LPC-based audio encoder/decoder, (c) LPC-synthesis audio encoder/decoder (original transform coded excitation: TCX), and (d) time-domain LPC encoder/decoder.

Figure 4

Fig. 4. Baseline encoder for LPC-based MDCT coding.

Figure 5

Fig. 5. Baseline decoder for LPC-based MDCT coding.

Figure 6

Fig. 6. Example of LPC envelope and smoothed one. Envelope of frequency domain residue to be quantized is estimated to Hk/k. γ=0.92 is used for smoothing in this figure.

Figure 7

Fig. 7. Relationship between LPC envelope and variance of MDCT spectra for quantization. LPC envelope information assists efficient compression of entropy coding for SQ MDCT spectra. When LPC envelope has large value, we expect MDCT spectra have large variance and they should be efficiently coded by Rice coding with the large Rice parameter, which consume many, say 5, bits. In contrast, MDCT spectra should be efficiently coded with fewer bits when LPC envelope is small.

Figure 8

Fig. 8. Convergence process of code length and gain in a rate loop. If code length is longer than the target code length, gain is reduced, or step size is increased. Otherwise, gain is increased. Proposing new tools can enhance the efficiency of entropy coding, and can reduce the distortion subject to the same bit rates.

Figure 9

Fig. 9. Harmonic model combined with the baseline encoder.

Figure 10

Fig. 10. Harmonic model combined with the baseline decoder.

Figure 11

Fig. 11. Example of harmonic model used in combination with the LPC envelope.

Figure 12

Fig. 12. Differential scores (item-by-item average and 95% confidence intervals) of MUSHRA with and without the harmonic model and. Asterisks indicate the existence of significant difference at 5 % in a paired t-test.

Figure 13

Fig. 13. Differential scores (item-by-item average and 95% confidence intervals) of AB test with and without resolution-warped LPC. Asterisks indicate the existence of significant difference at 5% in a t-test.

Figure 14

Fig. 14. Resolution-warped LPC combined with the baseline encoder/decoder.

Figure 15

Fig. 15. PAPSE combined with the baseline encoder/decoder.

Figure 16

Fig. 16. SNR of the quantized spectra by the shape parameter α for the arithmetic coding using each method. A total of 24 720 frames of MDCT coefficients were tested at 16 kbps.

Figure 17

Fig. 17. Differential scores (item-by-item average and 95% confidence intervals) of MUSHRA with and without PAPSE. Asterisks indicate the existence of significant difference at 5% in a paired t-test.