Most cited

Cited by 444

An overview of ongoing point cloud compression standardization activities: video-based (V-PCC) and geometry-based (G-PCC)
D. Graziosi, O. Nakagami, S. Kuma, A. Zaghetto, T. Suzuki, A. Tabatabai
Published online by Cambridge University Press:

03 April 2020, e13
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
This article presents an overview of the recent standardization activities for point cloud compression (PCC). A point cloud is a 3D data representation used in diverse applications associated with immersive media including virtual/augmented reality, immersive telepresence, autonomous driving and cultural heritage archival. The international standard body for media compression, also known as the Motion Picture Experts Group (MPEG), is planning to release in 2020 two PCC standard specifications: video-based PCC (V-CC) and geometry-based PCC (G-PCC). V-PCC and G-PCC will be part of the ISO/IEC 23090 series on the coded representation of immersive media content. In this paper, we provide a detailed description of both codec algorithms and their coding performances. Moreover, we will also discuss certain unique aspects of point cloud compression.

Cited by 341

A tutorial survey of architectures, algorithms, and applications for deep learning
Li Deng
Published online by Cambridge University Press:

22 January 2014, e2
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
In this invited paper, my overview material on the same topic as presented in the plenary overview session of APSIPA-2011 and the tutorial material presented in the same conference [1] are expanded and updated to include more recent developments in deep learning. The previous and the updated materials cover both theory and applications, and analyze its future directions. The goal of this tutorial survey is to introduce the emerging area of deep learning or hierarchical learning to the APSIPA community. Deep learning refers to a class of machine learning techniques, developed largely since 2006, where many stages of non-linear information processing in hierarchical architectures are exploited for pattern classification and for feature learning. In the more recent literature, it is also connected to representation learning, which involves a hierarchy of features or concepts where higher-level concepts are defined from lower-level ones and where the same lower-level concepts help to define higher-level ones. In this tutorial survey, a brief history of deep learning research is discussed first. Then, a classificatory scheme is developed to analyze and summarize major work reported in the recent deep learning literature. Using this scheme, I provide a taxonomy-oriented survey on the existing deep architectures and algorithms in the literature, and categorize them into three classes: generative, discriminative, and hybrid. Three representative deep architectures – deep autoencoders, deep stacking networks with their generalization to the temporal domain (recurrent networks), and deep neural networks (pretrained with deep belief networks) – one in each of the three classes, are presented in more detail. Next, selected applications of deep learning are reviewed in broad areas of signal and information processing including audio/speech, image/vision, multimodality, language modeling, natural language processing, and information retrieval. Finally, future directions of deep learning are discussed and analyzed.

Cited by 324

Recent advances on active noise control: open issues and innovative applications
Yoshinobu Kajikawa, Woon-Seng Gan, Sen M. Kuo
Published online by Cambridge University Press:

28 August 2012, e3
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
The problem of acoustic noise is becoming increasingly serious with the growing use of industrial and medical equipment, appliances, and consumer electronics. Active noise control (ANC), based on the principle of superposition, was developed in the early 20th century to help reduce noise. However, ANC is still not widely used owing to the effectiveness of control algorithms, and to the physical and economical constraints of practical applications. In this paper, we briefly introduce some fundamental ANC algorithms and theoretical analyses, and focus on recent advances on signal processing algorithms, implementation techniques, challenges for innovative applications, and open issues for further research and development of ANC systems.

Cited by 230

Evaluating word embedding models: methods and experimental results
Bin Wang, Angela Wang, Fenxiao Chen, Yuncheng Wang, C.-C. Jay Kuo
Published online by Cambridge University Press:

08 July 2019, e19
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Extensive evaluation on a large number of word embedding models for language processing applications is conducted in this work. First, we introduce popular word embedding models and discuss desired properties of word models and evaluation methods (or evaluators). Then, we categorize evaluators into intrinsic and extrinsic two types. Intrinsic evaluators test the quality of a representation independent of specific natural language processing tasks while extrinsic evaluators use word embeddings as input features to a downstream task and measure changes in performance metrics specific to that task. We report experimental results of intrinsic and extrinsic evaluators on six word embedding models. It is shown that different evaluators focus on different aspects of word models, and some are more correlated with natural language processing tasks. Finally, we adopt correlation analysis to study performance consistency of extrinsic and intrinsic evaluators.

Cited by 219

Graph representation learning: a survey
Fenxiao Chen, Yun-Cheng Wang, Bin Wang, C.-C. Jay Kuo
Published online by Cambridge University Press:

28 May 2020, e15
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Research on graph representation learning has received great attention in recent years since most data in real-world applications come in the form of graphs. High-dimensional graph data are often in irregular forms. They are more difficult to analyze than image/video/audio data defined on regular lattices. Various graph embedding techniques have been developed to convert the raw graph data into a low-dimensional vector representation while preserving the intrinsic graph properties. In this review, we first explain the graph embedding task and its challenges. Next, we review a wide range of graph embedding techniques with insights. Then, we evaluate several stat-of-the-art methods against small and large data sets and compare their performance. Finally, potential applications and future directions are presented.

Cited by 181

An overview on video forensics
Simone Milani, Marco Fontani, Paolo Bestagini, Mauro Barni, Alessandro Piva, Marco Tagliasacchi, Stefano Tubaro
Published online by Cambridge University Press:

28 August 2012, e2
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
The broad availability of tools for the acquisition and processing of multimedia signals has recently led to the concern that images and videos cannot be considered a trustworthy evidence, since they can be altered rather easily. This possibility raises the need to verify whether a multimedia content, which can be downloaded from the internet, acquired by a video surveillance system, or received by a digital TV broadcaster, is original or not. To cope with these issues, signal processing experts have been investigating effective video forensic strategies aimed at reconstructing the processing history of the video data under investigation and validating their origins. The key assumption of these techniques is that most alterations are not reversible and leave in the reconstructed signal some “footprints”, which can be analyzed in order to identify the previous processing steps. This paper presents an overview of the video forensic techniques that have been proposed in the literature, focusing on the acquisition, compression, and editing operations, trying to highlight strengths and weaknesses of each solution. It also provides a review of simple processing chains that combine different operations. Anti-forensic techniques are also considered to outline the current limitations and highlight the open research issues.

Cited by 132

Survey on audiovisual emotion recognition: databases, features, and data fusion strategies
Chung-Hsien Wu, Jen-Chun Lin, Wen-Li Wei
Published online by Cambridge University Press:

11 November 2014, e12
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Emotion recognition is the ability to identify what people would think someone is feeling from moment to moment and understand the connection between his/her feelings and expressions. In today's world, human–computer interaction (HCI) interface undoubtedly plays an important role in our daily life. Toward harmonious HCI interface, automated analysis and recognition of human emotion has attracted increasing attention from the researchers in multidisciplinary research fields. In this paper, a survey on the theoretical and practical work offering new and broad views of the latest research in emotion recognition from bimodal information including facial and vocal expressions is provided. First, the currently available audiovisual emotion databases are described. Facial and vocal features and audiovisual bimodal data fusion methods for emotion recognition are then surveyed and discussed. Specifically, this survey also covers the recent emotion challenges in several conferences. Conclusions outline and address some of the existing emotion recognition issues.

Cited by 118

An overview of channel coding for 5G NR cellular communications
Jung Hyun Bae, Ahmed Abotabl, Hsien-Ping Lin, Kee-Bong Song, Jungwon Lee
Published online by Cambridge University Press:

24 June 2019, e17
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
A 5G new radio cellular system is characterized by three main usage scenarios of enhanced mobile broadband (eMBB), ultra-reliable and low latency communications (URLLC), and massive machine type communications, which require improved throughput, latency, and reliability compared with a 4G system. This overview paper discusses key characteristics of 5G channel coding schemes which are mainly designed for the eMBB scenario as well as for partial support of the URLLC scenario focusing on low latency. Two capacity-achieving channel coding schemes of low-density parity-check (LDPC) codes and polar codes have been adopted for 5G where the former is for user data and the latter is for control information. As a coding scheme for data, 5G LDPC codes are designed to support high throughput, a variable code rate and length and hybrid automatic repeat request in addition to good error correcting capability. 5G polar codes, as a coding scheme for control, are designed to perform well with short block length while addressing a latency issue of successive cancellation decoding.

Cited by 111

A review of blind source separation methods: two converging routes to ILRMA originating from ICA and NMF
Hiroshi Sawada, Nobutaka Ono, Hirokazu Kameoka, Daichi Kitamura, Hiroshi Saruwatari
Published online by Cambridge University Press:

14 May 2019, e12
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
This paper describes several important methods for the blind source separation of audio signals in an integrated manner. Two historically developed routes are featured. One started from independent component analysis and evolved to independent vector analysis (IVA) by extending the notion of independence from a scalar to a vector. In the other route, nonnegative matrix factorization (NMF) has been extended to multichannel NMF (MNMF). As a convergence point of these two routes, independent low-rank matrix analysis has been proposed, which integrates IVA and MNMF in a clever way. All the objective functions in these methods are efficiently optimized by majorization-minimization algorithms with appropriately designed auxiliary functions. Experimental results for a simple two-source two-microphone case are given to illustrate the characteristics of these five methods.

Cited by 107

A comprehensive study of the rate-distortion performance in MPEG point cloud compression
Part of:
- Collection from PCS 2018
Evangelos Alexiou, Irene Viola, Tomás M. Borges, Tiago A. Fonseca, Ricardo L. de Queiroz, Touradj Ebrahimi
Published online by Cambridge University Press:

12 November 2019, e27
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Recent trends in multimedia technologies indicate the need for richer imaging modalities to increase user engagement with the content. Among other alternatives, point clouds denote a viable solution that offers an immersive content representation, as witnessed by current activities in JPEG and MPEG standardization committees. As a result of such efforts, MPEG is at the final stages of drafting an emerging standard for point cloud compression, which we consider as the state-of-the-art. In this study, the entire set of encoders that have been developed in the MPEG committee are assessed through an extensive and rigorous analysis of quality. We initially focus on the assessment of encoding configurations that have been defined by experts in MPEG for their core experiments. Then, two additional experiments are designed and carried to address some of the identified limitations of current approach. As part of the study, state-of-the-art objective quality metrics are benchmarked to assess their capability to predict visual quality of point clouds under a wide range of radically different compression artifacts. To carry the subjective evaluation experiments, a web-based renderer is developed and described. The subjective and objective quality scores along with the rendering software are made publicly available, to facilitate and promote research on the field.

Cited by 95

Environmental sound recognition: a survey
Sachin Chachada, C.-C. Jay Kuo
Published online by Cambridge University Press:

15 December 2014, e14
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Although research in audio recognition has traditionally focused on speech and music signals, the problem of environmental sound recognition (ESR) has received more attention in recent years. Research on ESR has significantly increased in the past decade. Recent work has focused on the appraisal of non-stationary aspects of environmental sounds, and several new features predicated on non-stationary characteristics have been proposed. These features strive to maximize their information content pertaining to signal's temporal and spectral characteristics. Furthermore, sequential learning methods have been used to capture the long-term variation of environmental sounds. In this survey, we will offer a qualitative and elucidatory survey on recent developments. It includes four parts: (i) basic environmental sound-processing schemes, (ii) stationary ESR techniques, (iii) non-stationary ESR techniques, and (iv) performance comparison of selected methods. Finally, concluding remarks and future research and development trends in the ESR field will be given.

Cited by 84

Advances in anti-spoofing: from the perspective of ASVspoof challenges
Madhu R. Kamble, Hardik B. Sailor, Hemant A. Patil, Haizhou Li
Published online by Cambridge University Press:

15 January 2020, e2
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
In recent years, automatic speaker verification (ASV) is used extensively for voice biometrics. This leads to an increased interest to secure these voice biometric systems for real-world applications. The ASV systems are vulnerable to various kinds of spoofing attacks, namely, synthetic speech (SS), voice conversion (VC), replay, twins, and impersonation. This paper provides the literature review of ASV spoof detection, novel acoustic feature representations, deep learning, end-to-end systems, etc. Furthermore, the paper also summaries previous studies of spoofing attacks with emphasis on SS, VC, and replay along with recent efforts to develop countermeasures for spoof speech detection (SSD) task. The limitations and challenges of SSD task are also presented. While several countermeasures were reported in the literature, they are mostly validated on a particular database, furthermore, their performance is far from perfect. The security of voice biometrics systems against spoofing attacks remains a challenging topic. This paper is based on a tutorial presented at APSIPA Annual Summit and Conference 2017 to serve as a quick start for those interested in the topic.

Cited by 78

Survey on securing data storage in the cloud
Chun-Ting Huang, Lei Huang, Zhongyuan Qin, Hang Yuan, Lan Zhou, Vijay Varadharajan, C.-C. Jay Kuo
Published online by Cambridge University Press:

23 May 2014, e7
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Cloud Computing has become a well-known primitive nowadays; many researchers and companies are embracing this fascinating technology with feverish haste. In the meantime, security and privacy challenges are brought forward while the number of cloud storage user increases expeditiously. In this work, we conduct an in-depth survey on recent research activities of cloud storage security in association with cloud computing. After an overview of the cloud storage system and its security problem, we focus on the key security requirement triad, i.e., data integrity, data confidentiality, and availability. For each of the three security objectives, we discuss the new unique challenges faced by the cloud storage services, summarize key issues discussed in the current literature, examine, and compare the existing and emerging approaches proposed to meet those new challenges, and point out possible extensions and futuristic research opportunities. The goal of our paper is to provide a state-of-the-art knowledge to new researchers who would like to join this exciting new field.

Cited by 75

Deep learning: from speech recognition to language and multimodal processing
Li Deng
Published online by Cambridge University Press:

19 January 2016, e1
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
While artificial neural networks have been in existence for over half a century, it was not until year 2010 that they had made a significant impact on speech recognition with a deep form of such networks. This invited paper, based on my keynote talk given at Interspeech conference in Singapore in September 2014, will first reflect on the historical path to this transformative success, after providing brief reviews of earlier studies on (shallow) neural networks and on (deep) generative models relevant to the introduction of deep neural networks (DNN) to speech recognition several years ago. The role of well-timed academic-industrial collaboration is highlighted, so are the advances of big data, big compute, and the seamless integration between the application-domain knowledge of speech and general principles of deep learning. Then, an overview is given on sweeping achievements of deep learning in speech recognition since its initial success. Such achievements, summarized into six major areas in this article, have resulted in across-the-board, industry-wide deployment of deep learning in speech recognition systems. Next, more challenging applications of deep learning, natural language and multimodal processing, are selectively reviewed and analyzed. Examples include machine translation, knowledgebase completion, information retrieval, and automatic image captioning, where fresh ideas from deep learning, continuous-space embedding in particular, are shown to be revolutionizing these application areas albeit with less rapid pace than for speech and image recognition. Finally, a number of key issues in deep learning are discussed, and future directions are analyzed for perceptual tasks such as speech, image, and video, as well as for cognitive tasks involving natural language.

Cited by 66

An Overview of Coding Tools in AV1: the First Video Codec from the Alliance for Open Media
Part of:
- Collection from PCS 2018
Yue Chen, Debargha Mukherjee, Jingning Han, Adrian Grange, Yaowu Xu, Sarah Parker, Cheng Chen, Hui Su, Urvang Joshi, Ching-Han Chiang, Yunqing Wang, Paul Wilkins, Jim Bankoski, Luc Trudeau, Nathan Egge, Jean-Marc Valin, Thomas Davies, Steinar Midtskogen, Andrey Norkin, Peter de Rivaz, Zoe Liu
Published online by Cambridge University Press:

24 February 2020, e6
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
In 2018, the Alliance for Open Media (AOMedia) finalized its first video compression format AV1, which is jointly developed by the industry consortium of leading video technology companies. The main goal of AV1 is to provide an open source and royalty-free video coding format that substantially outperforms state-of-the-art codecs available on the market in compression efficiency while remaining practical decoding complexity as well as being optimized for hardware feasibility and scalability on modern devices. To give detailed insights into how the targeted performance and feasibility is realized, this paper provides a technical overview of key coding techniques in AV1. Besides, the coding performance gains are validated by video compression tests performed with the libaom AV1 encoder against the libvpx VP9 encoder. Preliminary comparison with two leading HEVC encoders, x265 and HM, and the reference software of VVC is also conducted on AOM's common test set and an open 4k set.

Cited by 63

Visual quality assessment: recent developments, coding applications and future trends
Tsung-Jung Liu, Yu-Chieh Lin, Weisi Lin, C.-C. Jay Kuo
Published online by Cambridge University Press:

11 July 2013, e4
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Research on visual quality assessment has been active during the last decade. In this work, we provide an in-depth review of recent developments in the field. As compared with existing survey papers, our current work has several unique contributions. First, besides image quality databases and metrics, we put equal emphasis on video quality databases and metrics as this is a less investigated area. Second, we discuss the application of visual quality evaluation to perceptual coding as an example for applications. Third, we benchmark the performance of state-of-the-art visual quality metrics with experiments. Finally, future trends in visual quality assessment are discussed.

Cited by 57

NLCA-Net: a non-local context attention network for stereo matching
Zhibo Rao, Mingyi He, Yuchao Dai, Zhidong Zhu, Bo Li, Renjie He
Published online by Cambridge University Press:

07 July 2020, e18
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
Accurate disparity prediction is a hot spot in computer vision, and how to efficiently exploit contextual information is the key to improve the performance. In this paper, we propose a simple yet effective non-local context attention network to exploit the global context information by using attention mechanisms and semantic information for stereo matching. First, we develop a 2D geometry feature learning module to get a more discriminative representation by taking advantage of multi-scale features and form them into the variance-based cost volume. Then, we construct a non-local attention matching module by using the non-local block and hierarchical 3D convolutions, which can effectively regularize the cost volume and capture the global contextual information. Finally, we adopt a geometry refinement module to refine the disparity map to further improve the performance. Moreover, we add the warping loss function to help the model learn the matching rule of the non-occluded region. Our experiments show that (1) our approach achieves competitive results on KITTI and SceneFlow datasets in the end-point error and the fraction of erroneous pixels $({D_1})$; (2) our proposed method particularly has superior performance in the reflective regions and occluded areas.

Cited by 57

A tutorial survey of architectures, algorithms, and applications for deep learning – ERRATUM
Li Deng
Published online by Cambridge University Press:

03 April 2014, e5
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation

Cited by 54

Grayscale-based block scrambling image encryption using YCbCr color space for encryption-then-compression systems
Part of:
- Security and forensics in compression technology
Warit Sirichotedumrong, Hitoshi Kiya
Published online by Cambridge University Press:

01 February 2019, e7
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
A novel grayscale-based block scrambling image encryption scheme is presented not only to enhance security, but also to improve the compression performance for Encryption-then-Compression (EtC) systems with JPEG compression, which are used to securely transmit images through an untrusted channel provider. The proposed scheme enables the use of a smaller block size and a larger number of blocks than the color-based image encryption scheme. Images encrypted using the proposed scheme include less color information due to the use of grayscale images even when the original image has three color channels. These features enhance security against various attacks, such as jigsaw puzzle solver and brute-force attacks. Moreover, generating the grayscale-based images from a full-color image in YCbCr color space allows the use of color sub-sampling operation, which can provide the higher compression performance than the conventional grayscale-based encryption scheme, although the encrypted images have no color information. In an experiment, encrypted images were uploaded to and then downloaded from Twitter and Facebook, and the results demonstrated that the proposed scheme is effective for EtC systems and enhances the compression performance, while maintaining the security against brute-force and jigsaw puzzle solver attacks.

Cited by 40

Advances in deep learning approaches for image tagging
Jianlong Fu, Yong Rui
Published online by Cambridge University Press:

04 October 2017, e11
- Article
- - You have access
  - Open access
- PDF
- HTML
- Export citation
The advent of mobile devices and media cloud services has led to the unprecedented growth of personal photo collections. One of the fundamental problems in managing the increasing number of photos is automatic image tagging. Image tagging is the task of assigning human-friendly tags to an image so that the semantic tags can better reflect the content of the image and therefore can help users better access that image. The quality of image tagging depends on the quality of concept modeling which builds a mapping from concepts to visual images. While significant progresses are made in the past decade on image tagging, the previous approaches can only achieve limited success due to the limited concept representation ability from hand-crafted features (e.g., Scale-Invariant Feature Transform, GIST, Histogram of Oriented Gradients, etc.). Further progresses are made, since the efficient and effective deep learning algorithms have been developed. The purpose of this paper is to categorize and evaluate different image tagging approaches based on deep learning techniques. We also discuss the relevant problems and applications to image tagging, including data collection, evaluation metrics, and existing commercial systems. We conclude the advantages of different image tagging paradigms and propose several promising research directions for future works.

APSIPA Transactions on Signal and Information Processing

Actions for selected content:

Save Search