References

Ian Vince McLoughlin

doi:10.1017/CBO9781316084205.014

References

Published online by Cambridge University Press: 05 June 2016

Ian Vince McLoughlin

Show author details

Ian Vince McLoughlin: Affiliation:
University of Kent

Book contents

Get access

Summary

A summary is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content.

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'

Type: Chapter
Information: Speech and Audio Processing
A MATLAB-based Approach
, pp. 370 - 378

DOI: https://doi.org/10.1017/CBO9781316084205.014 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1] M., Frigo and S.G., Johnson. The design and implementation of FFTW3.Proc. IEEE, 93(2):216–231, February 2005. doi: 10.1109/JPROC.2004.840301.Google Scholar

[2] S. W, Smith. Digital Signal Processing: A Practical Guide for Engineers and Scientists. Newnes, 2000. www.dspguide.com.Google Scholar

[3] J.W., Gibbs. Fourier series. Nature, 59:606, 1899.Google Scholar

[4] R.W., Schaefer and L. R., Rabiner. Digital representation of speech signals. Proc. IEEE, 63(4):662–677, 1975.Google Scholar

[5] B. P., Bogert, M.J.R., Healy, and J.W, Tukey. The quefrency analysis of time series for echoes: cepstrum, pseudo-autocovarience, cross-cepstrum and saphe cracking. In M., Rosenblatt, editor, Proc. Symposium on Time-Series Analysis, pages 209–243. Wiley, 1963.Google Scholar

[6] D.G., Childers, D. P., Skinner, and R.C., Kemerait. The cepstrum: a guide to processing.Proc. IEEE, 65(10):1428–1443, October 1977.Google Scholar

[7] F., Zheng, G., Zhang, and Z., Song. Comparison of different implementations of MFCC.J. Computer Science and Technology, 16(6):582–589, September 2001.Google Scholar

[8] J., Barkera and M., Cooke. Is the sine-wave speech cocktail party worth attending?Speech Communication, 27(3–4):159–174, April 1999.Google Scholar

[9] M.R., Schroeder, B. S., Atal, and J. L., Hall. Optimizing digital speech coders by exploiting masking properties of the human ear.J. Acoustical Society of America, 66(6):1647–1652, 1979.Google Scholar

[10] I., Witten. Principles of Computer Speech. Academic Press, 1982.Google Scholar

[11] H. R., Sharifzadeh, I.V., McLoughlin, and M. J., Russell. A comprehensive vowel space for whispered speech.Journal of Voice, 26(2):e49–e56, 2012.Google Scholar

[12] B. C. J., Moore. An Introduction to the Psychology of Hearing. Academic Press, 1992.Google Scholar

[13] I. B., Thomas. The influence of first and second formants on the intelligibility of clipped speech.J. Acoustical Society of America, 16(2):182–185, 1968.

[14] J., Pickett. The Sounds of Speech Communication. Allyn and Bacon, 1980.Google Scholar

[15] Z., Li, E.C., Tan, I., McLoughlin, and T. T., Teo. Proposal of standards for intelligibility tests of Chinese speech.IEE Proc. Vision Image and Signal Processing, 147(3):254–260, June 2000.Google Scholar

[16] F. L., Chong, I., McLoughlin, and K., Pawlikowski. A methodology for improving PESQ accuracy for Chinese speech. In Proc. IEEE TENCON, Melbourne, November 2005.Google Scholar

[17] K., Kryter. The Handbook of Hearing and the Effects of Noise. Academic Press, 1994.Google Scholar

[18] L. L., Beranek. The design of speech communications systems.Proc. IRE, 35(9):880–890, September 1947.Google Scholar

[19] W., Tempest, editor. The Noise Handbook. Academic Press, 1985. 370Google Scholar

[20] M., Mourjopoulos, J., Tsoukalas, and D., Paraskevas. Speech enhancement using psychoacoustic criteria. In Proc. Int. Conf. on Acoustics, Speech and Signal Processing, pages 359–362, 1991.Google Scholar

[21] F., White. Our Acoustic Environment. John Wiley & Sons, 1976.Google Scholar

[22] P. J., Blamey, R.C., Dowell, and G.M., Clark. Acoustic parameters measured by a formant estimating speech processor for a multiple-channel cochlear implant.J. Acoustical Society of America, 82(1):38–47, 1987.Google Scholar

[23] I.V., McLoughlin, Y., Xu, and Y., Song. Tone confusion in spoken and whispered Mandarin Chinese. In Chinese Spoken Language Processing (ISCSLP), 2014 9th Int. Symp. on, pages 313–316. IEEE, 2014.Google Scholar

[24] I. B., Thomas. Perceived pitch of whispered vowels.J. Acoustical Society of America, 46: 468–470, 1969.Google Scholar

[25] Y., Swerdlin, J., Smith, and J., Wolfe. The effect of whisper and creak vocal mechanisms on vocal tract resonances.J. Acoustical Society of America, 127(4):2590–2598, 2010.Google Scholar

[26] P. C., Loizou. Speech Enhancement: Theory and Practice. CRC Press, 2013.Google Scholar

[27] N., Kitawaki, H., Nagabuchi, and K., Itoh. Objective quality evaluation for lowbit- rate speech coding systems.IEEE J. Selected Areas in Communications, 6(2): 242–248, 1988.Google Scholar

[28] Y., Hu and P.C, Loizou. Evaluation of objective quality measures for speech enhancement.IEEE Trans. Audio, Speech, and Language Processing, 16(1):229–238, 2008.Google Scholar

[29] A. D., Sharpley. Dynastat webpages, 1996 to 2006. www.dynastat.com/SpeechIntelligibility.htm.

[30] S. F., Boll. Suppression of acoustic noise in speech using spectral subtraction.IEEE Trans. Acoustics, Speech and Signal Processing, 27(2):113–120, 1979.Google Scholar

[31] R. E. P., Dowling and L. F., Turner. Modelling the detectability of changes in auditory signals.Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 1, pages 133–136, 1993.Google Scholar

[32] J. I., Alcantera, G. J., Dooley, P. J., Blamey, and P.M., Seligman. Preliminary evaluation of a formant enhancement algorithm on the perception of speech in noise for normally hearing listeners.J. Audiology, 33(1):15–24, 1994.Google Scholar

[33] J. G. van, Velden and G. F., Smoorenburg. Vowel recognition in noise for male, female and child voices.Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, pages 796–799, 1995.Google Scholar

[34] G. A., Miller, G. A., Heise, and W., Lichten. The intelligibility of speech as a function of the context of the test materials.Experimental Psychology, 41:329–335, 1951.Google Scholar

[35] W. G., Sears. Anatomy and Physiology for Nurses and Students of Human Biology. Arnold, 4th edition, 1967.Google Scholar

[36] J., Simner, C., Cuskley, and S., Kirby. What sound does that taste? Cross-modal mappings across gustation and audition.Perception, 39(4):553, 2010.Google Scholar

[37] R., Duncan-Luce. Sound and Hearing: A Conceptual Introduction. Lawrence Erlbaum and Associates, 1993.Google Scholar

[38] W. F., Ganong. Review of Medical Physiology. Lange Medical Publications, 9th edition, 1979.Google Scholar

[39] H., Fletcher and W.A., Munson. Loudness, its definition, measurement and calculation. Bell System Technical Journal, 12(4):377–430, 1933.Google Scholar

[40] K., Kryter. The Effects of Noise on Man. Academic Press, 2nd edition, 1985.Google Scholar

[41] R., Plomp. Detectability threshold for combination tones.J. Acoustical Society of America, 37(6):1110–1123, 1965.Google Scholar

[42] K., Ashihara. Combination tone: absent but audible component.Acoustical Science and Technology, 27(6):332, 2006.Google Scholar

[43] J. C. R., Licklider. Auditory Feature Analysis. Academic Press, 1956.Google Scholar

[44] Y. M., Cheng and D., O'shaughnessy. Speech enhancement based conceptually on auditory evidence.Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, pages 961–963, 1991.Google Scholar

[45] N., Virag. Speech enhancement based on masking properties of the auditory system.Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, pages 796–799, 1995.Google Scholar

[46] Y., Gao, T., Huang, and J. P., Haton. Central auditory model for spectral processing.Proc. Int. Conf. on Acoustics, Speech and Signal Processing, pages 704–707, 1993.Google Scholar

[47] I.V., McLoughlin and Z.-P., Xie. Speech playback geometry for smart homes. In Consumer Electronics (ISCE 2014), 18th IEEE Int. Symp. on, pages 1–2. IEEE, 2014.Google Scholar

[48] C. R., Darwin and R. B., Gardner. Mistuning a harmonic of a vowel: grouping and phase effects on vowel quality.J. Acoustical Society of America, 79:838–845, 1986.Google Scholar

[49] D., Sen, D. H., Irving, and W.H., Holmes. Use of an auditory model to improve speech coders.Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 2, pages 411–415, 1993.Google Scholar

[50] H., Hermansky. Perceptual linear predictive (PLP) analysis of speech.J. Acoustical Society of America, 87(4):1738–1752, April 1990.Google Scholar

[51] S. S., Stevens, J., Volkmann, and E.B., Newman. A scale for the measurement of the psychological magnitude pitch.J. Acoustical Society of America, 8(3):185–190, 1937. doi: http://dx.doi.org/10.1121/1.1915893. http://scitation.aip.org/content/asa/journal/jasa/8/3/10.1121/1.1915893.Google Scholar

[52] D., O'shaughnessy. Speech Communication: Human and Machine. Addison-Wesley, 1987.

[53] G., Fant. Analysis and synthesis of speech processes. In B., Malmberg, editor, Manual of Phonetics, pages 173–177. North-Holland, 1968.Google Scholar

[54] ISO/MPEG–Audio Standard layers. Editorial pages.Sound Studio Magazine, pages 40–41, July 1992.

[55] A., Azirani, R., Jeannes, and G., Faucon. Optimizing speech enhancement by exploiting masking properties of the human ear.Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, pages 800–803, 1995.Google Scholar

[56] A. S., Bregman. Auditory Scene Analysis. MIT Press, 1990.Google Scholar

[57] H., Purwins, B., Blankertz, and K., Obermayer. Computing auditory perception.Organised Sound, 5(3):159–171, 2000.Google Scholar

[58] C. M. M., Tio, I.V., McLoughlin, and R.W., Adi. Perceptual audio data concealment and watermarking scheme using direct frequency domain substitution.IEE Proc. Vision, Image & Signal Processing, 149(6):335–340, 2002.CrossRef Google Scholar

[59] I.V., McLoughlin and R. J., Chance. Method and apparatus for speech enhancement in a speech communications system.PCT international patent (PCT/GB98/01936), July 1998.Google Scholar

[60] Y. M., Cheng and D., O'shaughnessy. Speech enhancement based conceptually on auditory evidence.IEEE Trans. Signal Processing, 39(9):1943–1954, 1991.Google Scholar

[61] N., Jayant, J., Johnston, and R., Safranek. Signal compression based on models of human perception.Proc. IEEE, 81(10):1383–1421, 1993.Google Scholar

[62] D., Sen and W. H., Holmes. Perceptual enhancement of CELP speech coders. In Proc. Int. Conf. on Acoustics, Speech and Signal Processing, pages 105–108, 1993.Google Scholar

[63] J., Markel and A., Grey. Linear Prediction of Speech. Springer-Verlag, 1976.Google Scholar

[64] J., Makhoul. Linear prediction: a tutorial review.Proc. IEEE, 63(4):561–580, April 1975.Google Scholar

[65] S., Saito and K., Nakata. Fundamentals of Speech Signal Processing. Academic Press, 1985.Google Scholar

[66] J. L., Kelly and C.C., Lochbaum. Speech synthesis. Proc. Fourth Int. Congress on Acoustics, pages 1–4, September 1962.Google Scholar

[67] B. H., Story, I. R., Titze, and E.A., Hoffman. Vocal tract area functions from magnetic resonance imaging.J. Acoustical Society of America, 100(1):537–554, 1996.Google Scholar

[68] N., Sugamura and N., Favardin. Quantizer design in LSP speech analysis–synthesis.IEEE J. Selected Areas in Communications, 6(2):432–440, February 1988.Google Scholar

[69] S., Saoudi, J., Boucher, and A., Guyader. A new efficient algorithm to compute the LSP parameters for speech coding.Signal Processing, 28(2):201–212, 1995.Google Scholar

[70] T., I. and M. I., T.TIMIT database. A CD-ROM database of phonetically classified recordings of sentences spoken by a number of different male and female speakers, disc 1-1.1, 1990.

[71] N., Sugamura and F., Itakura. Speech analysis and synthesis methods developed at ECL in NTT – from LPC to LSP.Speech Communications, pages 213–229, 1986.Google Scholar

[72] J. S., Collura and T. E., Tremain. Vector quantizer design for the coding of LSF parameters. In Proc. Int. Conf. on Acoustics, Speech and Signal Processing, pages 29–32, 1993.Google Scholar

[73] I.V., McLoughlin. LSP parameter interpretation for speech classification. In Proc. 2nd IEEE Int. Conf. on Information, Communications and Signal Processing, December 1999.Google Scholar

[74] I.V., McLoughlin and F., Hui. Adaptive bit allocation for LSP parameter quantization. In Proc. IEEE Asia–Pacific Conf. on Circuits and Systems, paper number 231, December 2000.Google Scholar

[75] Q., Zhao and J., Suzuki. Efficient quantization of LSF by utilising dynamic interpolation. In IEEE Int. Symp. on Circuits and Systems, pages 2629–2632, June 1997.Google Scholar

[76] European Telecommunications Standards Institute. Trans-European trunked radio system (TETRA) standard. 1994.

[77] K. K., Paliwal and B. S., Atal. Efficient vector quantization of LPC parameters at 24 bits per frame. In Proc. Int. Conf. on Acoustics, Speech and Signal Processing, pages 661–664, 1991.Google Scholar

[78] D.-I., Chang, S., Ann, and C.W., Lee. A classified split vector quantization of LSF parameters.Signal Processing, 59(3):267–273, June 1997.Google Scholar

[79] R., Laroia, N., Phamdo, and N., Farvardin. Robust and efficient quantization of speech LSP parameters using structured vector quantizers. In Proc. Int. Conf. on Acoustics, Speech and Signal Processing, pages 641–644, 1991.Google Scholar

[80] H., Zarrinkoub and P., Mermelstein. Switched prediction and quantization of LSP frequencies. In Proc. Int. Conf. on Acoustics, Speech and Signal Processing, pages 757–760, 1996.Google Scholar

[81] C. S., Xydeas and K.K.M., So. A long history quantization approach to scalar and vector quantization of LSP coefficients. In Proc. Int. Conf. on Acoustics, Speech and Signal Processing, pages 1–4, 1993.Google Scholar

[82] J.-H., Chen, R.V., Cox, Y.-C., Lin, N., Jayant, and M. J., Melchner. A low-delay CELP coder for the CCITT 16 kb/s speech coding standard.IEEE J. Selected Areas in Communications, 10(5):830–849, June 1992.Google Scholar

[83] B. S., Atal. Predictive coding of speech at low bitrates.IEEE Trans. Communications, 30(4):600–614, 1982.Google Scholar

[84] M. R., Schroeder and B. S., Atal. Code-excited linear prediction CELP: high-quality speech at very low bit rates. In Proc. Int. Conf. on Acoustics, Speech and Signal Processing, pages 937–940, 1985.Google Scholar

[85] L. M., Supplee, R. P., Cohn, J. S., Collura, and A.V., McCree. MELP: the new Federal standard at 2400 bps. In Proc. IEEE Int. Conf. Acoustics Speech and Signal Processing, vol. 2, pages 1591–1594, April 1997.Google Scholar

[86] I. A., Gerson and M.A., Jasiuk. Vector sum excited linear prediction (VSELP) speech coding at 8 kbps. In Proc. IEEE Int. Conf. Acoustics Speech and Signal Processing, vol. 1, pages 461–464, April 1990.Google Scholar

[87] L. R., Rabiner and R.W., Schaefer. Digital Processing of Speech Signals. Prentice-Hall, 1978.Google Scholar

[88] I.V., McLoughlin. LSP parameter interpretation for speech classification. In Proc. 6th IEEE Int. Conf. on Electronics, Circuits and Systems, paper number 113, September 1999.Google Scholar

[89] K. K., Paliwal. A study of LSF representation for speaker-dependent and speakerindependent HMM-based speech recognition systems. In Proc. Int. Conf. on Acoustics, Speech and Signal Processing, vol. 2, pages 801–804, 1990.Google Scholar

[90] J., Parry, I., Burnett, and J., Chicharo. Linguistic mapping in LSF space for low-bit rate coding.Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 2, pages 653–656, March 1999.Google Scholar

[91] L. R., Rabiner, M., Cheng, A., Rosenberg, and C., McGonegal. a comparative performance study of several pitch detection algorithms.IEEE Trans. Acoustics, Speech and Signal Processing, 24(5):399–418, October 1976.Google Scholar

[92] L., Cohen. Time–Frequency Analysis. Prentice-Hall, 1995.Google Scholar

[93] Z. Q., Ding, I.V., McLoughlin, and E. C., Tan. How to track pitch pulse in LP residual – joint time-frequency distribution approach. In Proc. IEEE Pacific Rim Conf. on Communications, Computers and Signal Processing, August 2001.Google Scholar

[94] A. G., Krishna and T.V., Sreenivas. Musical instrument recognition: from isolated notes to solo phrases. In Proc. IEEE Int. Conf. on Acoustics Speech and Signal Processing, vol. 4, pages 265–268, 2004.Google Scholar

[95] The British Broadcasting Corporation (BBC). BBC Radio 4: Brett Westwood's guide to garden birdsong, May 2007. www.bbc.co.uk/radio4/science/birdsong.shtml.

[96] A., Harma and P., Somervuo. Classification of the harmonic structure in bird vocalization. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 5, pages 701–704, 2004.Google Scholar

[97] I., McLoughlin, M.-M., Zhang, Z.-P., Xie, Y., Song, and W., Xiao. Robust sound event classification using deep neural networks.IEEE Trans. Audio, Speech, and Language Processing, PP(99), 2015. doi: dx.doi.org/10.1109/TASLP.2015.2389618.

[98] R. F., Lyon. Machine hearing: an emerging field.IEEE Signal Processing Magazine, 42: 1414–1416, 2010.Google Scholar

[99] T. C., Walters. Auditory-based processing of communication sounds. PhD thesis, University of Cambridge, 2011.

[100] A., Kanagasundaram, R., Vogt, D. B., Dean, S., Sridharan, and M.W., Mason. i-vector based speaker recognition on short utterances. In Interspeech 2011, pages 2341–2344, Firenze Fiera, Florence, August 2011. International Speech Communication Association (ISCA). http://eprints.qut.edu.au/46313/.

[101] C., Cortes and V., Vapnik. Support-vector networks. In Machine Learning, 20(3):273–297, 1995.Google Scholar

[102] C.-C., Chang and C.-J., Lin. LIBSVM: a library for support vector machines.ACM Transactions on Intelligent Systems and Technology, 2(3), article 27, software available at www.csie.ntu.edu.tw/∼cjlin/libsvm.

[103] S., Balakrishnama and A., Ganapathiraju. Linear discriminant analysis – a brief tutorial. Institute for Signal and information Processing, 1998. www.isip.piconepress.com/publications/reports/1998/isip/lda

[104] A., Hyvärinen and E., Oja. Independent component analysis: algorithms and applications.Neural Networks, 13(4):411–430, 2000.Google Scholar

[105] C. M., Bishop. Pattern Recognition and Machine Learning. Springer, 2006.Google Scholar

[106] G. E., Hinton, S., Osindero, and Y.-W., Teh. A fast learning algorithm for deep belief nets.Neural Computation, 18(7):1527–1554, 2006.Google Scholar

[107] R. B., Palm. Prediction as a candidate for learning deep hierarchical models of data. Master's thesis, Technical University of Denmark, 2012.

[108] Y., LeCun and Y., Bengio. Convolutional networks for images, speech, and time series. In The Handbook of Brain Theory and Neural Networks, page 3361, MIT Press, 1995.

[109] J., Bouvrie. Notes on convolutional neural networks. 2006. http://cogprints.org/5869/.

[110] Y., LeCun, L., Bottou, Y., Bengio, and P., Haffner. Gradient-based learning applied to document recognition.Proc. IEEE, 86(11):2278–2324, 1998.Google Scholar

[111] O., Abdel-Hamid, A.-R., Mohamed, H., Jiang, and G., Penn. Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE Int. Conf. on, pages 4277–4280. IEEE, 2012.Google Scholar

[112] T. N., Sainath, A.-R., Mohamed, B., Kingsbury, and B., Ramabhadran. Deep convolutional neural networks for LVCSR. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE Int. Conf. on, pages 8614–8618. IEEE, 2013.Google Scholar

[113] H.-M., Zhang, I., McLoughlin, and Y., Song. Robust sound event recognition using convolutional neural networks. In Proc. ICASSP, paper number 2635. IEEE, 2015.Google Scholar

[114] R., Cole, J., Mariani, H., Uszkoreit, G. B., Varile, A., Zaenen, A., Zampolli, and V., Zue, editors. Survey of the State of the Art in Human Language Technology. Cambridge University Press, 2007.Google Scholar

[115] C. A., Kamm, K. M., Yang, C. R., Shamieh, and S., Singhal. Speech recognition issues for directory assistance applications. In Proc. 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications IVTTA94, pages 15–19, Kyoto, September 1994.Google Scholar

[116] J. G., Fiscus, J., Ajot, and J. S., Garofolo. The rich transcription. 2007 meeting recognition evaluation. In Multimodal Technologies for Perception of Humans, pages 373–389. Springer, 2008.Google Scholar

[117] H. B., Yu and M.-W., Mak. Comparison of voice activity detectors for interview speech in NIST speaker recognition evaluation. In Interspeech 2011, pages 2353–2356, August 2011.Google Scholar

[118] F., Beritelli, S., Casale, and A., Cavallaro. A robust voice activity detector for wireless communications using soft computing.IEEE J. Selected Areas in Communications,, 16(9):1818–1829, 1998.Google Scholar

[119] M.-Y., Hwang and X., Huang. Subphonetic modeling with Markov states-Senone. In Acoustics, Speech, and Signal Processing, 1992. ICASSP-92, 1992 IEEE Int. Conf. on, vol. 1, pages 33–36. IEEE, 1992.Google Scholar

[120] Y., Song, B., Jiang, Y., Bao, S., Wei, and L.-R., Dai. i-vector representation based on bottleneck features for language identification.Electronics Letters, 49(24):1569–1570, November 2013. doi: 10.1049/el.2013.1721.Google Scholar

[121] B., Jiang, Y., Song, S., Wei, J.-H., Liu, I.V., McLoughlin, and L.-R., Dai. Deep bottleneck features for spoken language identification.PLoS ONE, 9(7):e100795, July 2014. doi: 10. 1371/journal.pone.0100795. http://dx.doi.org/10.1371%2Fjournal.pone.0100795.

[122] B., Jiang, Y., Song, S., Wei, M.-G., Wang, I., McLoughlin, and L.-R., Dai. Performance evaluation of deep bottleneck features for spoken language identification. In Chinese Spoken Language Processing (ISCSLP), 2014 9th Int. Symp. on, pages 143–147, September 2014. doi: 10.1109/ISCSLP.2014.6936580.Google Scholar

[123] S., Xue, O., Abdel-Hamid, H., Jiang, and L., Dai. Direct adaptation of hybrid DNN/HMM model for fast speaker adaptation in LVCSR based on speaker code. In Proc. ICASSP, pages 6339–6343, 2014.Google Scholar

[124] C., Kong, S., Xue, J., Gao, W., Guo, L., Dai, and H., Jiang. Speaker adaptive bottleneck features extraction for LVCSR based on discriminative learning of speaker codes. In Chinese Spoken Language Processing (ISCSLP), 2014 9th Int. Symp. on, pages 83–87. IEEE, 2014.Google Scholar

[125] S., Young, G., Evermann, M., Gales, T., Hain, D., Kershaw, X., Liu, G., Moore, J., Odell, D., Ollason, D., Povey et al. The HTK book, vol. 2. Entropic Cambridge Research Laboratory, Cambridge, 1997.

[126] L. R., Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition.Proc. IEEE, 77(2):257–286, 1989.Google Scholar

[127] M., Gales and S., Young. The application of hidden Markov models in speech recognition.Foundations and Trends in Signal Processing, 1(3):195–304, 2008.Google Scholar

[128] L. F., Uebel and P. C., Woodland. An investigation into vocal tract length normalisation. In Eurospeech, 1999.Google Scholar

[129] D., Povey, A., Ghoshal, G., Boulianne, L., Burget, O., Glembek, N., Goel, M., Hannemann, P., Motlicek, Y., Qian, P., Schwarz et al. The Kaldi speech recognition toolkit. In IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society, December 2011. IEEE Catalog No.: CFP11SRW-USB.

[130] W., Walker, P., Lamere, P., Kwok, B., Raj, R., Singh, E., Gouvea, P., Wolf, and J., Woelfel. Sphinx-4: a flexible open source framework for speech recognition, 2004. cmusphinx.sourceforge.net/sphinx4/doc/Sphinx4Whitepaper.pdf.

[131] A. P. A., Broeders. Forensic speech and audio analysis, forensic linguistics 1998 to 2001 – a review. In Proc. 13th INTERPOL Forensic Science Symposium, pages 51–84, Lyon, October 2001.Google Scholar

[132] A. P. A., Broeders. Forensic speech and audio analysis, forensic linguistics 2001 to 2004 – a review. In Proc. 14th INTERPOL Forensic Science Symposium, pages 171–188, Lyon, 2004.Google Scholar

[133] R., Togneri and D., Pullella. An overview of speaker identification: accuracy and robustness issues. IEEE Circuits and Systems Magazine, 11(2):23–61, 2011. doi: 10.1109/MCAS.2011.941079.Google Scholar

[134] S., Furui. Recent advances in speaker recognition.Pattern Recognition Letters, 18:859–872, 1997.

[135] S., Furui. Speaker-dependent-feature extraction, recognition and processing techniques.Speech Communication, 10: 505–520, 1991.Google Scholar

[136] G., Doddington, W., Liggett, A., Martin, M., Przybocki, and D. A., Reynolds. Sheep, goats, lambs and wolves: a statistical analysis of speaker performance in the NIST 1998 speaker recognition evaluation. In Proc. 5th Int. Conf. on Spoken Language Processing, vol. 0608, November 1998.Google Scholar

[137] M. A., Zissman and K.M., Berkling. Automatic language identification.Speech Communication, 35: 115–124, 2001.Google Scholar

[138] K., Wu, Y., Song, W., Guo, and L., Dai. Intra-conversation intra-speaker variability compensation for speaker clustering. In Chinese Spoken Language Processing (ISCSLP), 2012 8th Int. Symp. on, pages 330–334. IEEE, 2012.Google Scholar

[139] S., Meignier and T., Merlin. LIUM SpkDiarization: an open source toolkit for diarization. In CMU SPUD Workshop, 2010.Google Scholar

[140] D.-C., Lyu, T. P., Tan, E., Chang, and H., Li. SEAME: a Mandarin-English code-switching speech corpus in South-East Asia. In INTERSPEECH, volume 10, pages 1986–1989, 2010.Google Scholar

[141] M., Edgington. Investigating the limitations of concatenative synthesis. In EUROSPEECH- 1997, pages 593–596, Rhodes, September 1997.Google Scholar

[142] C. K., Ogden. Basic English: A General Introduction with Rules and Grammar. Number 29. K. Paul, Trench, Trubner, 1944.

[143] T., Dutoit. High quality text-to-speech synthesis: an overview.J. Electrical & Electronics Engineering, Australia: Special Issue on Speech Recognition and Synthesis, 17(1):25–36, March 1997.Google Scholar

[144] T. B., Amin, P., Marziliano, and J. S., German. Glottal and vocal tract characteristics of voice impersonators.IEEE Trans. on Multimedia, 16(3):668–678, 2014.Google Scholar

[145] I.V., McLoughlin. The art of public speaking for engineers.IEEE Potentials, 25(3):18–21, 2006.Google Scholar

[146] The University of Edinburgh, The Centre for Speech Technology Research. The festival speech synthesis system, 2004. www.cstr.ed.ac.uk/projects/festival/.

[147] P., Taylor, A., Black, and R., Caley. The architecture of the Festival speech synthesis system. In Third International Workshop on Speech Synthesis, Sydney, November 1998.Google Scholar

[148] Voice Browser Working Group. Speech synthesis markup language (SSML) version 1.0. W3C Recommendation, September 2004.

[149] K. K., Paliwal. On the use of line spectral frequency parameters for speech recognition.Digital Signal Processing, 2: 80–87, 1992.Google Scholar

[150] I.V., McLoughlin and R. J., Chance. LSP-based speech modification for intelligibility enhancement. In 13th Int. Conf. on DSP, Santorini, July 1997.Google Scholar

[151] I.V., McLoughlin and R. J., Chance. LSP analysis and processing for speech coders.IEE Electronics Letters, 33(99):743–744, 1997.Google Scholar

[152] A., Schaub and P., Straub. Spectral sharpening for speech enhancement/noise reduction. In Proc. Int. Conf. on Acoustics, Speech and Signal Processing, pages 993–996, 1991.Google Scholar

[153] H., Valbret, E., Moulines, and J. P., Tubach. Voice transformation using PSOLA technique. In IEEE Int. Conf. Acoustics Speech and Signal Proc., pages 145–148, San Francisco, CA, March 1992.

[154] R.W, Morris and M. A., Clements. Reconstruction of speech from whispers.Medical Engineering & Physics, 24(7):515–520, 2002.Google Scholar

[155] H. R., Sharifzadeh, I.V., McLoughlin, and F., Ahmadi. Reconstruction of normal sounding speech for laryngectomy patients through a modified CELP codec.IEEE Trans. Biomedical Engineering, 57:2448–2458, October 2010.Google Scholar

[156] J., Li, I.V., McLoughlin, and Y., Song. Reconstruction of pitch for whisper-to-speech conversion of Chinese. In Chinese Spoken Language Processing (ISCSLP), 2014 9th Int. Symp. on, pages 206–210. IEEE, 2014.Google Scholar

[157] F., Ahmadi and I.V., McLoughlin. Measuring resonances of the vocal tract using frequency sweeps at the lips. In 2012 5th Int. Symp. on Communications Control and Signal Processing (ISCCSP), 2012.Google Scholar

[158] F., Ahmadi and I., McLoughlin, The use of low-frequency ultrasonics in speech processing. In Signal Proceesing, S., Miron (ed.). InTech, 2010, pp. 503–528.Google Scholar

[159] I.V., McLoughlin. Super-audible voice activity detection.IEEE/ACM Trans. on Audio, Speech, and Language Processing, 22(9):1424–1433, 2014.Google Scholar

[160] F., Ahmadi, I.V., McLoughlin, and H. R., Sharifzadeh. Autoregressive modelling for linear prediction of ultrasonic speech. In INTERSPEECH, pages 1616–1619, 2010.Google Scholar

[161] I.V., McLoughlin and Y., Song. Mouth state detection from low-frequency ultrasonic reflection.Circuits, Systems, and Signal Processing, 34(4):1279–1304, 2015.Google Scholar

[162] R.W., Schafer. What is a Savitzky–Golay filter? [lecture notes].IEEE Signal Processing Magazine, 28(4):111–117, 2011. doi: 10.1109/MSP.2011.941097.Google Scholar

[163] F., Ahmadi, M., Ahmadi, and I.V., McLoughlin. Human mouth state detection using low[163] F., Ahmadi, M., Ahmadi, and I.V., McLoughlin. Human mouth state detection using low frequency ultrasound. In INTERSPEECH, pages 1806–1810, 2013Google Scholar

Book contents

References

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive