To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This paper proposes a novel approach for lossless coding of light field (LF) images based on a macro-pixel (MP) synthesis technique which synthesizes the entire LF image in one step. The reference views used in the synthesis process are selected based on four different view configurations and define the reference LF image. This image is stored as an array of reference MPs which collect one pixel from each reference view, being losslessly encoded as a base layer. A first contribution focuses on a novel network design for view synthesis which synthesizes the entire LF image as an array of synthesized MPs. A second contribution proposes a network model for coding which computes the MP prediction used for lossless encoding of the remaining views as an enhancement layer. Synthesis results show an average distortion of 29.82 dB based on four reference views and up to 36.19 dB based on 25 reference views. Compression results show an average improvement of 29.9% over the traditional lossless image codecs and 9.1% over the state-of-the-art.
Laughter commonly occurs in daily interactions, and is not only simply related to funny situations, but also to expressing some type of attitudes, having important social functions in communication. The background of the present work is to generate natural motions in a humanoid robot, so that miscommunication might be caused if there is mismatching between audio and visual modalities, especially in laughter events. In the present work, we used a multimodal dialogue database, and analyzed facial, head, and body motion during laughing speech. Based on the analysis results of human behaviors during laughing speech, we proposed a motion generation method given the speech signal and the laughing speech intervals. Subjective experiments were conducted using our android robot by generating five different motion types, considering several modalities. Evaluation results showed the effectiveness of controlling different parts of the face, head, and upper body (eyelid narrowing, lip corner/cheek raising, eye blinking, head motion, and upper body motion control).
One challenge faced by reinforcement learning (RL) agents is that in many environments the reward signal is sparse, leading to slow improvement of the agent’s performance in early learning episodes. Potential-based reward shaping can help to resolve the aforementioned issue of sparse reward by incorporating an expert’s domain knowledge into the learning through a potential function. Past work on reinforcement learning from demonstration (RLfD) directly mapped (sub-optimal) human expert demonstration to a potential function, which can speed up RL. In this paper we propose an introspective RL agent that significantly further speeds up the learning. An introspective RL agent records its state–action decisions and experience during learning in a priority queue. Good quality decisions, according to a Monte Carlo estimation, will be kept in the queue, while poorer decisions will be rejected. The queue is then used as demonstration to speed up RL via reward shaping. A human expert’s demonstration can be used to initialize the priority queue before the learning process starts. Experimental validation in the 4-dimensional CartPole domain and the 27-dimensional Super Mario AI domain shows that our approach significantly outperforms non-introspective RL and state-of-the-art approaches in RLfD in both domains.
A semi-fragile watermarking scheme is proposed in this paper for detecting tampering in speech signals. The scheme can effectively identify whether or not original signals have been tampered with by embedding hidden information into them. It is based on singular-spectrum analysis, where watermark bits are embedded into speech signals by modifying a part of the singular spectrum of a host signal. Convolutional neural network (CNN)-based parameter estimation is deployed to quickly and properly select the part of the singular spectrum to be modified so that it meets inaudibility and robustness requirements. Evaluation results show that CNN-based parameter estimation reduces the computational time of the scheme and also makes the scheme blind, i.e. we require only a watermarked signal in order to extract a hidden watermark. In addition, a semi-fragility property, which allows us to detect tampering in speech signals, is achieved. Moreover, due to the time efficiency of the CNN-based parameter estimation, the proposed scheme can be practically used in real-time applications.
It has been well studied that reliable multicast enables consistency protocols, including Byzantine Fault Tolerant protocols, for distributed systems. However, no transport-layer reliable multicast is used today due to limitations with existing switch fabrics and transport-layer protocols. In this paper, we introduce a layer-4 (L4) transport based on remote direct memory access (RDMA) datagram to achieve reliable multicast over a shared optical medium. By connecting a cluster of networking nodes using a passive optical cross-connect fabric enhanced with wavelength division multiplexing, all messages are broadcast to all nodes. This mechanism enables consistency in a distributed system to be maintained at a low latency cost. By further utilizing RDMA datagram as the L4 protocol, we have achieved a low-enough message loss-ratio (better than one in 68 billion) to make a simple Negative Acknowledge (NACK)-based L4 multicast practical to deploy. To our knowledge, it is the first multicast architecture able to demonstrate such low message loss-ratio. Furthermore, with this reliable multicast transport, end-to-end latencies of eight microseconds or less (< 8us) have been routinely achieved using an enhanced software RDMA implementation on a variety of commodity 10G Ethernet network adapters.
The standardization process for Versatile Video Coding (VVC), the next generation video coding standard, was launched in 2018, after several recent advances in video coding technologies had been investigated under the Joint Video Experts Team (JVET) of ITU-T VCEG and ISO/IEC MPEG experts. The recent standard development status (up to VVC working draft 2) shows that the VTM software, the test model for this VVC standard, can achieve over 23% average coding gain under random access configuration when compared to the HM software, the test model of HEVC standard. This paper gives a review of recently developed video coding technologies that have been either adopted into the VVC working draft as part of the standard or under further evaluation for potential inclusions.
Voice conversion aims to change a source speaker's voice to make it sound like the one of a target speaker while preserving linguistic information. Despite the rapid advance of voice conversion algorithms in the last decade, most of them are still too complicated to be accessible to the public. With the popularity of mobile devices especially smart phones, mobile voice conversion applications are highly desirable such that everyone can enjoy the pleasure of high-quality voice mimicry and people with speech disorders can also potentially benefit from it. Due to the limited computing resources on mobile phones, the major concern is the time efficiency of such a mobile application to guarantee positive user experience. In this paper, we detail the development of a mobile voice conversion system based on the Gaussian mixture model (GMM) and the weighted frequency warping methods. We attempt to boost the computational efficiency by making the best of hardware characteristics of today's mobile phones, such as parallel computing on multiple cores and the advanced vectorization support. Experimental evaluation results indicate that our system can achieve acceptable voice conversion performance while the conversion time for a five-second sentence only takes slightly more than one second on iPhone 7.
Automatic speech recognition (ASR) systems often make unrecoverable errors due to subsystem pruning (acoustic, language and pronunciation models); for example, pruning words due to acoustics using short-term context, prior to rescoring with long-term context based on linguistics. In this work, we model ASR as a phrase-based noisy transformation channel and propose an error correction system that can learn from the aggregate errors of all the independent modules constituting the ASR and attempt to invert those. The proposed system can exploit long-term context using a neural network language model and can better choose between existing ASR output possibilities as well as re-introduce previously pruned or unseen (Out-Of-Vocabulary) phrases. It provides corrections under poorly performing ASR conditions without degrading any accurate transcriptions; such corrections are greater on top of out-of-domain and mismatched data ASR. Our system consistently provides improvements over the baseline ASR, even when baseline is further optimized through Recurrent Neural Network (RNN) language model rescoring. This demonstrates that any ASR improvements can be exploited independently and that our proposed system can potentially still provide benefits on highly optimized ASR. Finally, we present an extensive analysis of the type of errors corrected by our system.
In this paper, we consider optimal components grouping in series–parallel and parallel–series systems composed of k subsystems. All components in each subsystem are drawn from a heterogeneous population consisting of m different subpopulations. Firstly, we show that when one allocation vector is majorized by another one, then the series–parallel (parallel–series) system corresponding to the first (second) vector is more reliable than that of the other. Secondly, we study the impact of changes in the number of subsystems on the system reliability. Finally, we study the influence of the selection probabilities of subpopulations on the system reliability.
We present a new axiomatization of classical mereology in which the three components of the theory—ordering, composition, and decomposition principles—are neatly separated. The equivalence of our axiom system with other, more familiar systems is established by purely deductive methods, along with additional results on the relative strengths of the composition and decomposition axioms of each system.
In this paper, we investigate the degree profile and Gini index of random caterpillar trees (RCTs). We consider RCTs which evolve in two different manners: uniform and nonuniform. The degrees of the vertices on the central path (i.e., the degree profile) of a uniform RCT follows a multinomial distribution. For nonuniform RCTs, we focus on those growing in the fashion of preferential attachment. We develop methods based on stochastic recurrences to compute the exact expectations and the dispersion matrix of the degree variables. A generalized Pólya urn model is exploited to determine the exact joint distribution of these degree variables. We apply the methods from combinatorics to prove that the asymptotic distribution is Dirichlet. In addition, we propose a new type of Gini index to quantitatively distinguish the evolutionary characteristics of the two classes of RCTs. We present the results via several numerical experiments.
A pair of bouncing geometric Brownian motions (GBMs) is studied. The bouncing GBMs behave like GBMs except that, when they meet, they bounce off away from each other. The object of interest is the position process, which is defined as the position of the latest meeting point at each time. We study the distributions of the time and position of their meeting points, and show that the suitably scaled logarithmic position process converges weakly to a standard Brownian motion as the bounce size δ→0. We also establish the convergence of the bouncing GBMs to mutually reflected GBMs as δ→0. Finally, applying our model to limit order books, we derive a simple and effective prediction formula for trading prices.
ML is two languages in one: there is the core, with types and expressions, and there are modules, with signatures, structures, and functors. Modules form a separate, higher-order functional language on top of the core. There are both practical and technical reasons for this stratification; yet, it creates substantial duplication in syntax and semantics, and it imposes seemingly unnecessary limits on expressiveness because it makes modules second-class citizens of the language. For example, selecting one among several possible modules implementing a given interface cannot be made a dynamic decision. Language extensions allowing modules to be packaged up as first-class values have been proposed and implemented in different variations. However, they remedy expressiveness only to some extent and tend to be even more syntactically heavyweight than using second-class modules alone. We propose a redesign of ML in which modules are truly first-class values, and core and module layers are unified into one language. In this “1ML”, functions, functors, and even type constructors are one and the same construct; likewise, no distinction is needed between structures, records, or tuples. Or viewed the other way round, everything is just (“a mode of use of”) modules. Yet, 1ML does not require dependent types: its type structure is expressible in terms of plain System Fω, with a minor variation of our F-ing modules approach. We introduce both an explicitly typed version of 1ML and an extension with Damas–Milner-style implicit quantification. Type inference for this language is not complete, but, we argue, not substantially worse than for Standard ML.
In recent work Philip Welch has proven the existence of ‘ineffable liars’ for Hartry Field’s theory of truth. These are offered as liar-like sentences that escape classification in Field’s transfinite hierarchy of determinateness operators. In this article I present a slightly more general characterization of the ineffability phenomenon, and discuss its philosophical significance. I show the ineffable sentences to be less ‘liar-like’ than they appear in Welch’s presentation. I also point to some open technical problems whose resolution would greatly clarify the philosophical issues raised by the ineffability phenomenon.
In this paper, a type of parallel robot with three translational degrees of freedom is studied. Inverse and forward kinematic equations are extracted for position and velocity analyses. The dynamic model is derived by Lagrange’s approach and the principle of virtual work and related computational algorithms implementing inverse and forward dynamics are presented. Furthermore, some numerical simulations are performed using the kinematic and dynamic models in which the results show good agreement with expected qualitative behavior of the mechanism. Comparisons with the results of work-energy and impulse-momentum methods quantitatively verify the validity of the derived equations of motion. Also, a relative computational effectiveness is observed in implementation of virtual work model via the simulations.
In this work we present NEUROExos, a novel generation of upper-limb exoskeletons developed in recent years at The BioRobotics Institute of Scuola Superiore Sant’Anna (Italy). Specifically, we present our attempts to progressively (i) improve the ergonomics and safety (ii) reduce the encumbrance and weight, and (iii) develop more intuitive human–robot cognitive interfaces. Our latest prototype, described here for the first time, extends the field of application to assistance in activities of daily living, thanks to its compact and portable design. The experimental studies carried out on these devices are summarized, and a perspective on future developments is presented.
This paper is a follow-up to [4], in which a mistake in [6] (which spread also to [9]) was corrected. We give a strenghtening of the main result on the semantical nonconservativity of the theory of PT− with internal induction for total formulae ${(\rm{P}}{{\rm{T}}^ - } + {\rm{INT}}\left( {{\rm{tot}}} \right)$, denoted by PT− in [9]). We show that if to PT− the axiom of internal induction for all arithmetical formulae is added (giving ${\rm{P}}{{\rm{T}}^ - } + {\rm{INT}}$), then this theory is semantically stronger than ${\rm{P}}{{\rm{T}}^ - } + {\rm{INT}}\left( {{\rm{tot}}} \right)$. In particular the latter is not relatively truth definable (in the sense of [11]) in the former. Last but not least, we provide an axiomatic theory of truth which meets the requirements put forward by Fischer and Horsten in [9]. The truth theory we define is based on Weak Kleene Logic instead of the Strong one.