To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The constructs that have been presented in the previous chapter are enough for the creation of simple software systems. On the other hand, it is quite possible to create very complex software systems with these constructs, but the design and implementation processes will be really difficult. Fortunately, Scala provides many advanced features and constructs that facilitate programming as a mental activity. In this chapter we will describe most of these advanced features, while a few others like parsing combinators and actors will be presented thoroughly in later chapters.
Playing with trees
In the previous chapter we presented many important data types, but we did not mention trees, which forma group of data types that have many uses. Also, from the discussion so far, it is not clear whether Scala provides a facility for the construction of recursive data types, that is data types that are defined in terms of themselves. For example, a binary tree is a typical example of a recursively defined data structure that can be defined as follows.
Definition 1 Given the type node, a binary tree over the type node is defined in the following way.
It is quite probable that most of us are not consciously aware of an ever-appearing design pattern, which goes far beyond the design patterns in the normal sense of [24]. This pattern has to do with how we organize our data and, sometimes as a consequence, how we access these data. What we are talking about is the hierarchical data organization pattern that we can abbreviate in short as: Hierarchies are everywhere!
A file system is the canonical example of hierarchical organization. Its structure is a collection of files and directories, with the directories playing the role of containers for other files and/or directories. The Unix tradition has more to say about files, since the file system “pattern” has been extended to support other use cases than the traditional ones. For example, in Linux, /proc is a special mounted file system which can be used to view some kernel configuration and parameters. In fact, normal file system I/O calls can be used to write data into this special file system, so that kernel and driver parameters can be changed at runtime.
XML advocates will feel pleased to recognize that XML has been promoting such hierarchical organization. We are not sure how many of them were aware of the real essence of the general “Hierarchies are everywhere” pattern mentioned above, but the pattern itself is ubiquitous. Strangely enough, hierarchical databases have not survived, but probably XML strikes back on their behalf.
Today's computers have multi-core processors (i.e., integrated circuits to which two or more processors have been attached), which, in principle, allow the concurrent execution of computer instructions. In other words, today's computers are able to perform two or more tasks at the same time. Concurrent programming refers to the design and implementation of programs that consist of interacting computational processes that should be executed in parallel. In addition, concurrent programming is not only the next logical step in software development, but the next necessary step. Thus, all modern programming languages must provide constructs and libraries that will ease the construction of concurrent programs. Scala allows users to design and implement concurrent programs using either threads, or mailboxes or actors. Unfortunately, programming with threads is a cumbersome task, thus, concurrent applications in Scala are usually implemented using the actor model of programming.
Programming with threads: an overview
Roughly, a process is a program loaded into memory that is being executed. A thread, also known as a lightweight process, is a basic unit of processor utilization. Processes may include more than one thread while traditional processes include only one thread. Threads may effectively communicate but since they share a process's resources (for example, memory and open files), their communication is not without problems. Each Scala program has at least one thread while several other “system” threads take care of events in GUI applications, input and output, etc.
XML, the eXtensible Markup Language, is an industry standard for document markup. XML has been adopted in many fields that include software, physics, chemistry, finance, law, etc. XML is used to represent data that are interchanged between different operating systems while most configuration files in many operating systems are XML files. The widespread use of XML dictated the design and implementation of tools capable of handling XML content. Scala is a modern programming language and so it includes a standard library for the manipulation of XML documents. This library, which was designed and implemented by Burak Emir, is the subject of this chapter.
What is XML?
A markup is an annotation to text that describes how it is to be structured, laid out, or formatted. Markups are specified using tags that are usually enclosed in angle brackets. XML is a meta-markup language, that is, a language that can be used to define a specific set of tags that are suitable for a particular task. For example, one can define tags for verses, stanzas, and strophes in order to express poems in XML. When a specific set of tags is used to describe entities of a particular kind, then this set is called an XML application. For example, if one precisely specifies tags suitable to describe poems and uses them only for this purpose, then the resulting set of tags is an XML application.
Scala is a relatively new programming language that was designed by Martin Odersky and released in 2003. The distinguishing features of Scala include a seamless integration of functional programming features into an otherwise objectoriented language. Scala owes its name to its ability to scale, that is, it is a language that can grow by providing an infrastructure that allows the introduction of new constructs and data types. In addition, Scala is a concurrent programming language, thus, it is a tool for today as well as tomorrow! Scala is a compiled language. Its compiler produces bytecode for the Java Virtual Machine, thus allowing the (almost) seamless use of Java tools and constructs from within scala. The language has been used to rewrite Twitter's back-end services. In addition, almost all of Foursquare's infrastructure has been coded in Scala. This infrastructure is used by several companies worldwide (for example, Siemens, Sony Pictures Imageworks).
Who should read this book?
The purpose of this book is twofold: first to teach the basics of Scala and then to show how Scala can be used to develop real applications. Unlike other books on Scala, this one does not assume any familiarity with Java. In fact, no previous knowledge of Java is necessary to read this book, though some knowledge of Java would be beneficial, especially in the chapter on GUI applications.
Let X and Y be two finite disjoint sets of elements over some ordered type and of combined size greater than k. Consider the problem of computing the kth smallest element of X ⋃ Y. By definition, the kth smallest element of a set is one for which there are exactly k elements smaller than it, so the zeroth smallest is the smallest. How long does such a computation take?
The answer depends, of course, on how the sets X and Y are represented. If they are both given as sorted lists, then O(∣X∣+ ∣Y∣) steps are sufficient. The two lists can be merged in linear time and the kth smallest can be found at position k in the merged list in a further O(k) steps. In fact, the total time is O(k) steps, since only the first k + 1 elements of the merged list need be computed. But if the two sets are given as sorted arrays, then – as we show below – the time can further be reduced to O(log ∣X∣+log∣Y∣) steps. This bound depends on arrays having a constant-time access function. The same bound is attainable if both X and Y are represented by balanced binary search trees, despite the fact that two such trees cannot be merged in less than linear time.
The fast algorithm is another example of divide and conquer, and the proof that it works hinges on a particular relationship between merging and selection.
This pearl, and the one following, is all about arithmetic coding, a way of doing data compression. Unlike other methods, arithmetic coding does not represent each individual symbol of the text as an integral number of bits; instead, the text as a whole is encoded as a binary fraction in the unit interval. Although the idea can be traced back much earlier, it was not until the publication of an “accessible implementation” by Witten, Neal and Cleary in 1987 that arithmetic coding became a serious competitor in the world of data compression. Over the past two decades the method has been refined and its advantages and disadvantages over rival schemes have been elucidated. Arithmetic coding can be more effective at compression than rivals such as Huffman coding, or Shannon–Fano coding, and is well suited to take account of the statistical properties of the symbols in a text. On the other hand, coding and decoding times are longer than with other methods.
Arithmetic coding has a well-deserved reputation for being tricky to implement; nevertheless, our aim in these two pearls is to give a formal development of the basic algorithms. In the present pearl, coding and decoding are implemented in terms of arbitrary-precision rational arithmetic. This implementation is simple and elegant, though expensive in time and space. In the following pearl, coding and decoding are reimplemented in terms of finite-precision integers. This is where most of the subtleties of the problem reside.
Oh what a tangled web we weave when first we practise to derive.
(With apologies to Sir Walter Scott)
Introduction
Consider the problem of generating all bit strings a1a2 … an of length n satisfying given constraints of the form ai ≤ aj for various i and j. The generation is to be in Gray path order, meaning that exactly one bit changes from one bit string to the next. The transition code is a list of integers naming the bit that is to be changed at each step. For example, with n = 3, consider the constraints a1 ≤ a2 and a3 ≤ a2. One possible Gray path is 000, 010, 011, 111, 110 with transition code [2, 3, 1, 3] and starting string 000.
The snag is that the problem does not always have a solution. For example, with n = 4 and the constraints a1 ≤ a2 ≤ a4 and a1 ≤ a3 ≤ a4, the six possible bit strings, namely 0000, 0001, 0011, 0101, 0111 and 1111, cannot be permuted into a Gray path. There are four strings of even weight (the numbers of 1s) and two of odd weight, and in any Gray path the parity of the weights has to alternate.
Constraints of the form ai ≤ aj on bit strings of length n can be represented by a digraph with n nodes in which a directed edge i ← j is associated with a constraint ai ≤ aj.