Skip to main content Accessibility help
×
Home

Semi-fragile speech watermarking based on singular-spectrum analysis with CNN-based parameter estimation for tampering detection

  • Kasorn Galajit (a1) (a2) (a3), Jessada Karnjana (a3), Masashi Unoki (a1) and Pakinee Aimmanee (a2)

Abstract

A semi-fragile watermarking scheme is proposed in this paper for detecting tampering in speech signals. The scheme can effectively identify whether or not original signals have been tampered with by embedding hidden information into them. It is based on singular-spectrum analysis, where watermark bits are embedded into speech signals by modifying a part of the singular spectrum of a host signal. Convolutional neural network (CNN)-based parameter estimation is deployed to quickly and properly select the part of the singular spectrum to be modified so that it meets inaudibility and robustness requirements. Evaluation results show that CNN-based parameter estimation reduces the computational time of the scheme and also makes the scheme blind, i.e. we require only a watermarked signal in order to extract a hidden watermark. In addition, a semi-fragility property, which allows us to detect tampering in speech signals, is achieved. Moreover, due to the time efficiency of the CNN-based parameter estimation, the proposed scheme can be practically used in real-time applications.

  • View HTML
    • Send article to Kindle

      To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      Semi-fragile speech watermarking based on singular-spectrum analysis with CNN-based parameter estimation for tampering detection
      Available formats
      ×

      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      Semi-fragile speech watermarking based on singular-spectrum analysis with CNN-based parameter estimation for tampering detection
      Available formats
      ×

      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      Semi-fragile speech watermarking based on singular-spectrum analysis with CNN-based parameter estimation for tampering detection
      Available formats
      ×

Copyright

This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.

Corresponding author

Corresponding author: Kasorn Galajit, Email: kasorn.galajit@nectec.or.th

References

Hide All
[1]Kawahara, H.; Morise, M.: Technical foundations of TANDEM-STRAIGHT, a speech analysis, modification and synthesis framework, in Sadhana, 2011, 713727.
[2]Kawahara, H.; Matsui, H.: Auditory morphing based on an elastic perceptual distance metric in an interference-free time-frequency representation, in Acoustics, Speech, and Signal Processing, 2003. Proc. (ICASSP’03), 2003.
[3]Toda, T.; Black, A.W.; Tokuda, K.: Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory, in IEEE Transactions on Audio, Speech, and Language Processing, 2007, 22222235.
[4]Yan, B.; Lu, Z.M.; Sun, S.H.; Pan, J.S.: Speech authentication by semi-fragile watermarking, in International Conf. on Knowledge-Based and Intelligent Information and Engineering Systems, 2005, 497504.
[5]Park, C.M.; Thapa, D.; Wang, G.N.: Speech authentication system using digital watermarking and pattern recovery, in Pattern Recognition Letters, 2007, 931938.
[6]Wu, C.P.; Kuo, C.C.: Fragile speech watermarking for content integrity verification, in Circuits and Systems, 2002. ISCAS 2002, 2002.
[7]Unoki, M.; Miyauchi, R.: Detection of tampering in speech signals with inaudible watermarking technique, in Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP), 2012, 118121.
[8]Wang, S.; Unoki, M.; Kim, N.S.: Formant enhancement based speech watermarking for tampering detection, in Fifteenth Annual Conf. of the International Speech Communication Association (Interspeech2014), 2014.
[9]Karnjana, J.; Galajit, K.; Aimmanee, P.; Wutiwiwatchai, C.; Unoki, M.: Speech watermarking scheme based on singular-spectrum analysis for tampering detection and identification, in Asia-Pacific Signal and Information Processing Association Annual Summit and Conf. (APSIPA ASC), 2017, 193202.
[10]Nematollahi, M.A.; Al-Haddad, S.A.R.: An overview of digital speech watermarking, in International Journal of Speech Technology, 2013, 471488.
[11]Podilchuk, C.I.; Delp, E.J.: Digital watermarking: algorithms and applications, in IEEE signal processing Magazine, 2001, 3346.
[12]Karnjana, J.; Unoki, M.; Aimmanee, P.; Wutiwiwatchai, C.: Tampering detection in speech signals by semi-fragile watermarking based on singular-spectrum analysis, in Advances in Intelligent Information Hiding and Multimedia Signal Processing, 2017, 131140.
[13]Wang, S.; Unoki, M.: Speech watermarking method based on formant tuning, in IEICE TRANSACTIONS on Information and Systems, 2015, 2937.
[14]Karnjana, J.; Unoki, M.; Aimmanee, P.; Wutiwiwatchai, C.: Singular-Spectrum Analysis for Digital Audio Watermarking with Automatic Parameterization and Parameter Estimation, in IEICE TRANSACTIONS on Information and Systems, 2016, 21092120.
[15]Karnjana, J.; Unoki, M.; Aimmanee, P.; Wutiwiwatchai, C.: SSA-based audio-information-hiding scheme with psychoacoustic model, in Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2016, 110.
[16]Karnjana, J.; Unoki, M.; Aimmanee, P.; Wutiwiwatchai, C.: Audio watermarking scheme based on singular spectrum analysis and psychoacoustic model with self-synchronization, in Journal of Electrical and Computer Engineering, 2016.
[17]Galajit, K.; Karnjana, J.; Aimmanee, P.; Unoki, M.: Digital audio watermarking method based on singular spectrum analysis with automatic parameter estimation using a convolutional neural network, in Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP), 2018.
[18]LeCun, Y.; Bengio, Y.: Convolutional networks for images, speech, and time series, in Arbib, M.: Eds., The Handbook of Brain Theory and Neural Networks, MIT Press Cambridge, USA, 1995, 255258.
[19]Storn, R.; Price, K.: Differential evolution-a simple and efficient heuristic for global optimization over continuous space. J. Glob. Optim., 11 (4) (1997), 341359.
[20]Takeda, T.: Speech Database User's Manual ATR, Technical Report, in Proc. WASPAAOct, 2011.
[21Bossia, P.; Pitas, I.: Robust audio watermarking in the time domain, in EUSIPCO, 1998, 2528.
[22]Beerends, J.G.; Hekstra, A.P.; Rix, A.W.; Hollier, M.P.: Perceptual evaluation of speech quality (pesq) the new itu standard for end-to-end speech quality assessment part ii: psychoacoustic model. J. Audio Eng. Soc., 50 (10) (2002), 765778.
[23]Recommendation, I.T.: Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. Rec. ITU-T P. 862, 2001.
[24]Wang, S.; Miyauchi, R.; Unoki, M.; Kim, N.S.: Tampering detection scheme for speech signals using formant enhancement based watermarking. In Journal of Information Hiding and Multimedia Signal Processing, 2015, 12641283.

Keywords

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed