To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In this chapter we integrate concurrency into the framework of Modernized Algol described in Chapter 35. The resulting language, called Concurrent Algol, ℒ{nat cmd ⇀∥}, illustrates the integration of the mechanisms of the process calculus described in Chapter 41 into a practical programming language. To avoid distracting complications, we drop assignables from Modernized Algol entirely. (There is no loss of generality, however, because free assignables are definable in Concurrent Algol using processes as cells.)
The process calculus described in Chapter 41 is intended as a self-standing model of concurrent computation. When viewed in the context of a programming language, however, it is possible to streamline the machinery to take full advantage of types that are in any case required for other purposes. In particular, the concept of a channel, which features prominently in Chapter 41, may be identified with the concept of a dynamic class as described in Chapter 34. More precisely, we take broadcast communication of dynamically classified values as the basic synchronization mechanism of the language. Being dynamically classified, messages consist of a payload tagged with a class, or channel. The type of the channel determines the type of the payload. Importantly, only those processes that have access to the channel may decode the message; all others must treat it as inscrutable data that may be passed around but not examined. In this way we can model not only the mechanisms described in Chapter 41, but also formulate an abstract account of encryption and decryption in a network using the methods described in Chapter 41.
While human annotation is crucial for many natural language processing tasks, it is often very expensive and time-consuming. Inspired by previous work on crowdsourcing, we investigate the viability of using non-expert labels instead of gold standard annotations from experts for a machine learning approach to automatic readability prediction. In order to do so, we evaluate two different methodologies to assess the readability of a wide variety of text material: A more traditional setup in which expert readers make readability judgments and a crowdsourcing setup for users who are not necessarily experts. To this purpose two assessment tools were implemented: a tool where expert readers can rank a batch of texts based on readability, and a lightweight crowdsourcing tool, which invites users to provide pairwise comparisons. To validate this approach, readability assessments for a corpus of written Dutch generic texts were gathered. By collecting multiple assessments per text, we explicitly wanted to level out readers' background knowledge and attitude. Our findings show that the assessments collected through both methodologies are highly consistent and that crowdsourcing is a viable alternative to expert labeling. This is a good news as crowdsourcing is more lightweight to use and can have access to a much wider audience of potential annotators. By performing a set of basic machine learning experiments using a feature set that mainly encodes basic lexical and morpho-syntactic information, we further illustrate how the collected data can be used to perform text comparisons or to assign an absolute readability score to an individual text. We do not focus on optimising the algorithms to achieve the best possible results for the learning tasks, but carry them out to illustrate the various possibilities of our data sets. The results on different data sets, however, show that our system outperforms the readability formulas and a baseline language modelling approach. We conclude that readability assessment by comparing texts is a polyvalent methodology, which can be adapted to specific domains and target audiences if required.
This paper presents morpheme-based language models developed for Amharic (a morphologically rich Semitic language) and their application to a speech recognition task. A substantial reduction in the out of vocabulary rate has been observed as a result of using subwords or morphemes. Thus a severe problem of morphologically rich languages has been addressed. Moreover, lower perplexity values have been obtained with morpheme-based language models than with word-based models. However, when comparing the quality based on the probability assigned to the test sets, word-based models seem to fare better. We have studied the utility of morpheme-based language models in speech recognition systems and found that the performance of a relatively small vocabulary (5k) speech recognition system improved significantly as a result of using morphemes as language modeling and dictionary units. However, as the size of the vocabulary increases (20k or more) the morpheme-based systems suffer from acoustic confusability and did not achieve a significant improvement over a word-based system with an equivalent vocabulary size even with the use of higher order (quadrogram) n-gram language models.
Linear combinations of independent random variables have been extensively studied in the literature. Xu & Hu [21] and Pan, Xu, & Hu [16] unified the study of linear combinations of independent random variables under the general setup. This paper is a companion one of these two papers. In this paper, we will further study this topic. The results are further generalized to the cases of permutation invariant random variables and of independent but not necessarily identically distributed random variables which are ordered in the likelihood ratio or the hazard ratio order.
The sequential and stochastic assignment problem (SSAP) has wide applications in logistics, finance, and health care management, and has been well studied in the literature. It assumes that jobs with unknown values arrive according to a stochastic process. Upon arrival, a job's value is made known and the decision-maker must immediately decide whether to accept or reject the job and, if accepted, to assign it to a resource for a reward. The objective is to maximize the expected reward from the available resources. The optimal assignment policy has a threshold structure and can be computed in polynomial time. In reality, there exist situations in which the decision-maker may postpone the accept/reject decision. In this research, we study the value of postponing decisions by allowing a decision-maker to hold a number of jobs which may be accepted or rejected later. While maintaining this queue of arrivals significantly complicates the analysis, optimal threshold policies exist under mild assumptions when the resources are homogeneous. We illustrate the benefits of delaying decisions through higher profits and lower risk in both cases of homogeneous and heterogeneous resources.
We present analytic results for warehouse systems involving pairs of carousels. Specifically, for various picking strategies, we show that the sojourn time of the picker satisfies an integral equation that is a contraction mapping. As a result, numerical approximations for performance measures such as the throughput of the system are extremely accurate and converge fast (e.g., within five iterations) to their real values. We present simulation results validating our results and examining more complicated strategies for pairs of carousels.
Consider two independent Markov chains having states 0, 1, and identical transition probabilities. At each stage one of the chains is observed, and a reward equal to the observed state is earned. Assuming prior probabilities on the initial states of the chains it is shown that the myopic policy that always chooses to observe the chain most likely to be in state 1 stochastically maximizes the sequence of rewards earned in each period.
where Jt is the 3-graph consisting of a single vertex x together with a disjoint set A of size t and all $\binom{|A|}{2}$ 3-edges containing x. We also prove two Turán density results where we forbid certain induced subgraphs:
We give several new constructions, conjectures and bounds for Turán densities of 3-graphs which should be of interest to researchers in the area. Our main tool is ‘Flagmatic’, an implementation of Razborov's semi-definite method, which we are making publicly available. In a bid to make the power of Razborov's method more widely accessible, we have tried to make Flagmatic as user-friendly as possible, hoping to remove thereby the major hurdle that needs to be cleared before using the semi-definite method. Finally, we spend some time reflecting on the limitations of our approach, and in particular on which problems we may be unable to solve. Our discussion of the ‘complexity barrier’ for the semi-definite method may be of general interest.
E. Thorp introduced the following card shuffling model. Suppose the number of cards is even. Cut the deck into two equal piles, then interleave them as follows. Choose the first card from the left pile or from the right pile according to the outcome of a fair coin flip. Then choose from the other pile. Continue this way, flipping an independent coin for each pair, until both piles are empty.
We prove an upper bound of O(d3) for the mixing time of the Thorp shuffle with 2d cards, improving on the best known bound of O(d4). As a consequence, we obtain an improved bound on the time required to encrypt a binary message of length d using the Thorp shuffle.
A perfect Kt-matching in a graph G is a spanning subgraph consisting of vertex-disjoint copies of Kt. A classic theorem of Hajnal and Szemerédi states that if G is a graph of order n with minimum degree δ(G) ≥ (t − 1)n/t and t|n, then G contains a perfect Kt-matching. Let G be a t-partite graph with vertex classes V1, …, Vt each of size n. We show that, for any γ > 0, if every vertex x ∈ Vi is joined to at least $\bigl ((t-1)/t + \gamma \bigr )n$ vertices of Vj for each j ≠ i, then G contains a perfect Kt-matching, provided n is large enough. Thus, we verify a conjecture of Fischer [6] asymptotically. Furthermore, we consider a generalization to hypergraphs in terms of the codegree.
The world is awash with digital data from social networks, blogs, business, science and engineering. Data-intensive computing facilitates understanding of complex problems that must process massive amounts of data. Through the development of new classes of software, algorithms and hardware, data-intensive applications can provide timely and meaningful analytical results in response to exponentially growing data complexity and associated analysis requirements. This emerging area brings many challenges that are different from traditional high-performance computing. This reference for computing professionals and researchers describes the dimensions of the field, the key challenges, the state of the art and the characteristics of likely approaches that future data-intensive problems will require. Chapters cover general principles and methods for designing such systems and for managing and analyzing the big data sets of today that live in the cloud and describe example applications in bioinformatics and cybersecurity that illustrate these principles in practice.