To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In this paper, we propose a novel hue-correction scheme for color-image-enhancement algorithms including deep-learning-based ones. Although hue-correction schemes for color-image enhancement have already been proposed, there are no schemes that can both perfectly remove perceptual hue-distortion on the basis of CIEDE2000 and be applicable to any image-enhancement algorithms. In contrast, the proposed scheme can perfectly remove hue distortion caused by any image-enhancement algorithm such as deep-learning-based ones on the basis of CIEDE2000. Furthermore, the use of a gamut-mapping method in the proposed scheme enables us to compress a color gamut into an output RGB color gamut, without hue changes. Experimental results show that the proposed scheme can completely correct hue distortion caused by image-enhancement algorithms while maintaining the performance of the algorithms and ensuring the color gamut of output images.
This paper presents a different approach to tackle the Sound Source Localization (SSL) problem apply on a compact microphone array that can be mounted on top of a small moving robot in an indoor environment. Sound source localization approaches can be categorized into the three main categories; Time Difference of Arrival (TDOA), high-resolution subspace-based methods, and steered beamformer-based methods. Each method has its limitations according to the search or application requirements. Steered beamformer-based method will be used in this paper because it has proven to be robust to ambient noise and reverberation to a certain extent. The most successful and used algorithm of this method is the SRP-PHAT algorithm. The main limitation of SRP-PHAT algorithm is the computational burden resulting from the search process, this limitation comes from searching among all possible candidate locations in the searching space for the location that maximizes a certain function. The aim of this paper is to develop a computationally viable approach to find the coordinate location of a sound source with acceptable accuracy. The proposed approach comprises two stages: the first stage contracts the search space by estimating the Direction of Arrival (DoA) vector from the time difference of arrival with an addition of reasonable error coefficient around the vector to make sure that the sound source locates inside the estimated region, the second stage is to apply the SRP-PHAT algorithm to search only in this contracted region for the source location. The AV16.3 corpus was used to evaluate the proposed approach, extensive experiments have been carried out to verify the reliability of the approach. The results showed that the proposed approach was successful in obtaining good results compared to the conventional SRP-PHAT algorithm.
High Dynamic Range (HDR) displays can show images with higher color contrast levels and peak luminosities than the common Low Dynamic Range (LDR) displays. However, most existing video content is recorded and/or graded in LDR format. To show LDR content on HDR displays, it needs to be up-scaled using a so-called inverse tone mapping algorithm. Several techniques for inverse tone mapping have been proposed in the last years, going from simple approaches based on global and local operators to more advanced algorithms such as neural networks. Some of the drawbacks of existing techniques for inverse tone mapping are the need for human intervention, the high computation time for more advanced algorithms, limited low peak brightness, and the lack of the preservation of the artistic intentions. In this paper, we propose a fully-automatic inverse tone mapping operator based on mid-level mapping capable of real-time video processing. Our proposed algorithm allows expanding LDR images into HDR images with peak brightness over 1000 nits, preserving the artistic intentions inherent to the HDR domain. We assessed our results using the full-reference objective quality metrics HDR-VDP-2.2 and DRIM, and carrying out a subjective pair-wise comparison experiment. We compared our results with those obtained with the most recent methods found in the literature. Experimental results demonstrate that our proposed method outperforms the current state-of-the-art of simple inverse tone mapping methods and its performance is similar to other more complex and time-consuming advanced techniques.
This paper proposes signal detection methods for frequency domain equalization (FDE) based overloaded multiuser multiple input multiple output (MU-MIMO) systems for uplink Internet of things (IoT) environments, where a lot of IoT terminals are served by a base station having less number of antennas than that of IoT terminals. By using the fact that the transmitted signal vector has the discreteness and the group sparsity, we propose a convex discreteness and group sparsity aware (DGS) optimization problem for the signal detection. We provide an optimization algorithm for the DGS optimization on the basis of the alternating direction method of multipliers (ADMM). Moreover, we extend the DGS optimization into weighted DGS (W-DGS) optimization and propose an iterative approach named iterative weighted DGS (IW-DGS), where we iteratively solve the W-DGS optimization problem with the update of the parameters in the objective function. We also discuss the computational complexity of the proposed IW-DGS and show that we can reduce the order of the complexity by using the structure of the channel matrix. Simulation results show that the symbol error rate (SER) performance of the proposed method is close to that of the oracle zero forcing (ZF) method, which perfectly knows the activity of each IoT terminal.
This paper presents an evaluation of parallel voice conversion (VC) with neural network (NN)-based statistical models for spectral mapping and waveform generation. The NN-based architectures for spectral mapping include deep NN (DNN), deep mixture density network (DMDN), and recurrent NN (RNN) models. WaveNet (WN) vocoder is employed as a high-quality NN-based waveform generation. In VC, though, owing to the oversmoothed characteristics of estimated speech parameters, quality degradation still occurs. To address this problem, we utilize post-conversion for the converted features based on direct waveform modifferential and global variance postfilter. To preserve the consistency with the post-conversion, we further propose a spectrum differential loss for the spectral modeling. The experimental results demonstrate that: (1) the RNN-based spectral modeling achieves higher accuracy with a faster convergence rate and better generalization compared to the DNN-/DMDN-based models; (2) the RNN-based spectral modeling is also capable of producing less oversmoothed spectral trajectory; (3) the use of proposed spectrum differential loss improves the performance in the same-gender conversions; and (4) the proposed post-conversion on converted features for the WN vocoder in VC yields the best performance in both naturalness and speaker similarity compared to the conventional use of WN vocoder.
Research on graph representation learning has received great attention in recent years since most data in real-world applications come in the form of graphs. High-dimensional graph data are often in irregular forms. They are more difficult to analyze than image/video/audio data defined on regular lattices. Various graph embedding techniques have been developed to convert the raw graph data into a low-dimensional vector representation while preserving the intrinsic graph properties. In this review, we first explain the graph embedding task and its challenges. Next, we review a wide range of graph embedding techniques with insights. Then, we evaluate several stat-of-the-art methods against small and large data sets and compare their performance. Finally, potential applications and future directions are presented.
Most research on replay detection has focused on developing a stand-alone countermeasure that runs independently of a speaker verification system by training a single spoofed model and a single genuine model for all speakers. In this paper, we explore the potential benefits of adapting the back-end of a spoofing detection system towards the claimed target speaker. Specifically, we characterize and quantify speaker variability by comparing speaker-dependent and speaker-independent (SI) models of feature distributions for both genuine and spoofed speech. Following this, we develop an approach for implementing speaker-dependent spoofing detection using a Gaussian mixture model (GMM) back-end, where both the genuine and spoofed models are adapted to the claimed speaker. Finally, we also develop and evaluate a speaker-specific neural network-based spoofing detection system in addition to the GMM based back-end. Evaluations of the proposed approaches on replay corpora BTAS2016 and ASVspoof2017 v2.0 reveal that the proposed speaker-dependent spoofing detection outperforms equivalent SI replay detection baselines on both datasets. Our experimental results show that the use of speaker-specific genuine models leads to a significant improvement (around 4% in terms of equal error rate (EER)) as previously shown and the addition of speaker-specific spoofed models adds a small improvement on top (less than 1% in terms of EER).
Parkinson's disease and Alzheimer's disease are progressive nervous system disorders that affect physical and cognitive capacities of individuals, including memory loss, motion impairment, or problem-solving dysfunctions. Leisure activities are associated with reducing the risk of dementia and are preventive policies for delaying the cognitive impairment in later stages of those neurodegenerative diseases. Electronic games related to cognitive abilities are an easy and inexpensive alternative for stimulating brain activity in this kind of patients. The previous research demonstrated the acceptance of these activities in the environment of Connected TV when playing at home and in daily care centers. Interaction in Connected TV applications has its own particularities that influence the design of the interface, including the viewing distance, the type of interaction through a remote control or other techniques, the size of the screen, or the collectiveness of consumption. Iterative testing with patients of these groups revealed how the physical characteristics and cognitive impairment of these concrete end-users affect the human–computer interaction, offering guidelines and recommendations in good practices for the Smart TV interface design. On the other hand, data analytics extracted from the interaction and evolution of the game offer important information enabling the creation of estimation prediction models about the cognitive state of the patient.
Over the last decade, cost pressures, technology, automation, globalisation, de-regulation, and changing client relationships have transformed the practice of law, but legal education has been slow to respond. Deciding what learning objectives a law degree ought to prioritise, and how to best strike the balance between vocational and academic training, are questions of growing importance for students, regulators, educators, and the legal profession. This collection provides a range of perspectives on the suite of skills required by the future lawyer and the various approaches to supporting their acquisition. Contributions report on a variety of curriculum initiatives, including role-play, gamification, virtual reality, project-based learning, design thinking, data analytics, clinical legal education, apprenticeships, experiential learning and regulatory reform, and in doing so, offer a vision of what modern legal education might look like.
Within the next decades, robots will need to be able to execute a large variety of tasks autonomously in a large variety of environments. To relax the resulting programming effort, a knowledge-enabled approach to robot programming can be adopted to organize information in re-usable knowledge pieces. However, for the ease of reuse, there needs to be an agreement on the meaning of terms. A common approach is to represent these terms using ontology languages that conceptualize the respective domain. In this work, we will review projects that use ontologies to support robot autonomy. We will systematically search for projects that fulfill a set of inclusion criteria and compare them with each other with respect to the scope of their ontology, what types of cognitive capabilities are supported by the use of ontologies, and which is their application domain.
The end of the calendar year always seems like a good time to pause for breath and reflect on what’s been happening over the last 12 months, and that’s as true in the world of commercial NLP as it is in any other domain. In particular, 2019 has been a busy year for voice assistance, thanks to the focus placed on this area by all the major technology players. So, we take this opportunity to review a number of key themes that have defined recent developments in the commercialization of voice technology.
We study a single server queue under a processor-sharing type of scheduling policy, where the weights for determining the sharing are given by functions of each job's remaining service (processing) amount, and obtain a fluid limit for the scaled measure-valued system descriptors.
We introduce a weighted configuration model graph, where edge weights correspond to the probability of infection in an epidemic on the graph. On these graphs, we study the development of a Susceptible–Infectious–Recovered epidemic using both Reed–Frost and Markovian settings. For the special case of having two different edge types, we determine the basic reproduction numberR0, the probability of a major outbreak, and the relative final size of a major outbreak. Results are compared with those for a calibrated unweighted graph. The degree distributions are based on both theoretical constructs and empirical network data. In addition, bivariate standard normal copulas are used to model the dependence between the degrees of the two edge types, allowing for modeling the correlation between edge types over a wide range. Among the results are that the weighted graph produces much richer results than the unweighted graph. Also, while R0 always increases with increasing correlation between the two degrees, this is not necessarily true for the probability of a major outbreak nor for the relative final size of a major outbreak. When using copulas we see that these can produce results that are similar to those of the empirical degree distributions, indicating that in some cases a copula is a viable alternative to using the full empirical data.