To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Hash tables are containers that represent a collection of objects inserted at computed index locations. Each object inserted in the hash table is associated with a hash index. The process of hashing involves the computation of an integer index (the hash index) for a given object (such as a string). If designed properly, the hash computation (1) should be fast, and (2) when done repeatedly for a set of keys to be inserted in a hash table should produce hash indices uniformly distributed across the range of index values for the hash table. The term “hashing” is derived from the observation that there should be little if any obvious association between the object being inserted and its hash index. Two closely related objects such as the strings “time” and “lime” should generally produce unrelated hash indices. Thus hashing involves distributing objects into what appears to be random (but reproducible) locations in the table.
When two distinct objects produce the same hash index, we refer to this as a collision. Clearly the two objects cannot be placed at the same index location in the table. A collision resolution algorithm must be designed to place the second object at a location distinct from the first when their hash indices are identical.
The two fundamental problems associated with the construction of hash tables are:
the design of an efficient hash function that distributes the index values of inserted objects uniformly across the table
This chapter groups together three important data structures: trees, heaps, and priority queues. Trees are our first example of a nonlinear structure for containing objects. Although conceptually more complex than linear data structures, trees offer the opportunity for improved efficiency in operations such as inserting, removing, and searching for contained objects. Heaps are also nonlinear in structure and contained objects must be organized in agreement with an order relationship between each node and its descendants. A heap may be efficiently implemented using a binary tree. A priority queue is a special kind of queue that contains prioritized objects (usually based on a key) in a way that the objects are removed based on their priority (highest priority first). Priority queues may be implemented using a heap. There is a nonessential but beneficial relationship among these three data structures; that is why they are grouped together in this chapter. Additional variations on binary trees are covered in later chapters.
Trees
A tree is a nonlinear data structure that derives its name from a similarity between its defining terminology and our friends in the forest, real trees. A tree data structure is considerably more constrained in its variety than a real tree and is typically viewed upside down, with its root at the top and leaves on the bottom. A tree is usually accessed from its root, then down through its branches to the leaves.
A box of paper clips, a stack of trays in a cafeteria, and a room full of desks, chairs, lamps, and other furniture are containers. An array of records, a queue of customers at a movie theatre, a bag of groceries, a set of lottery tickets, a dictionary of words and their definitions, and a database of patient records are additional examples of containers. Some of the containers cited above – such as the box of paper clips, set of lottery tickets, and dictionary of words and their definitions – consist of identical types of objects, whereas the other containers consist of a mixture of object types. Each type of container has its own rules for ordering and accessing its entities.
It is important to make a distinction between the container object and the things that it contains. For example, we can distinguish the box that holds paper clips from the paper clips themselves. The box has an identity and existence even if it is empty. It is common to take home empty paper bags from a supermarket that may later be used as garbage bags.
This chapter, as its name implies, focuses on containers. It sets the stage for almost everything that will be done in later chapters. The study of data structures is the study of containers. In this chapter we delineate the behavior of many different container abstract data types.
All the foundations classes and the supporting source files for the book are contained in a single compressed file named foundations.zip. This file may be downloaded from the Cambridge University Press Web site at http://www.cup.org.
Extract the entire contents of the foundations.zip file into the directory of your choice. We will refer to this directory as user-dir in our discussion. A typical choice for user-dir might be C:\CS2notes. The structure of directories and files created in user-dir by the extraction is shown in Figure C.1.
Chapters 2 and 10, plus the appendices, have no supporting Java files. Each folder has supporting source files for laboratories and test programs discussed in its corresponding chapter. The entire structure is only 1.8 MB (1.13 MB of that is in a single file called distinct.txt containing words for use by examples in Chapter 16). File foundations.jar contains all the compiled class files for the foundations package.
A typical directory structure for the chapter folders is shown in Figure C.2 for Chapters 9 and 14. The docs folder in Chapter 9 provides javadoc generated documentation for class foundations.Fraction. Each GUI laboratory has application and user-interface source files plus a batch file that compiles and runs the application. These laboratories were developed using JBuilder3. The foundations folders contain compilable (do-nothing) source file stubs that are to be used in specific exercises. The support folders typically contain short test programs that are console-based or source file stubs to be used in exercises.
This appendix presents a brief introduction to UML notation as used in the book. For more detailed discussion of UML, its history, notation, documentation, and uses, the reader is referred to the UML Web page for Rational Software Corporation:
http://www.rational.com/uml/
Representing Classes in UML
UML notation provides a rich variety of options for graphically representing the details of a class. The basic icon for a class is a rectangular box with one, two, or three compartments as shown in Figure A.1. The compartments contain strings and special symbols. The Name compartment is required. The two List compartments typically contain attributes and operations and may be suppressed as desired. Within each compartment, UML offers many options for amount of detail to be shown.
Among the options for detail to be shown in the three compartments are the following:
String – an identifier representing a class name, field name, or method name.
≪stereotype-string≫ – A string in guillemets is a stereotype. Stereotypes may be thought of as categories that further qualify a class, field, or method. For example, we may use the stereotype «interface» to identify a class that is a Java interface. We may apply the stereotype «final» to a constant field and the stereotypes «command» or «query» to methods.
+, -, # - Visibility is indicated using a “+” symbol for public, a. “-” symbol for private, a “#” symbol for protected, or no symbol for package (Java default). […]
The main application of this chapter is algebraic expression evaluation. This is a classic and important problem. An algebraic expression containing single character operands and the four arithmetic operators is input as a string. When numeric values are assigned to each operand our goal is to be able to evaluate the arithmetic expression on the fly. The String representing the arithmetic expression is not known until runtime.
What makes this problem particularly interesting is that the core of the solution requires two stacks, each holding different types of data. The solution illustrates how abstractions (the stack in this case) may be utilized to provide an effective underpinning for the solution to a complex problem.
Algebraic Expression Evaluation
Problem: Develop a Java software application that takes an algebraic expression as an input string. An example of such an algebraic expression is (a + b) * c – d + e * f. After numeric values are assigned to each operand (values for a, b, c, d, e, and f), the algorithm must compute the value of the algebraic expression.
Input: A string representing an algebraic expression involving n operands and an n-tuple representing the values for the operands (i.e., numeric values for each operand).
Output: The value of the expression for the particular n-tuple of input operand values.
Solution of Problem:
1. Conversion from infix to postfix
The first step in solving this problem involves a transformation of the input algebraic expression from infix to postfix representation.
An essential and important part of computer problem solving is the development of algorithms – the detailed logic and steps required to solve a problem. All programmers are introduced very early to a number of useful programming constructs for building algorithms. These include assignment, branching, and iteration. Branching provides a means for conditional or alternative execution of steps in an algorithm. Iteration provides a convenient way to perform repetitive steps. Without branching and iteration the algorithms for even simple problem solutions would be either impossible or verbose and cumbersome. Another useful concept for construction of algorithms is recursion. Recursion is a construct that provides an alternative to iteration for repetitive steps. In many problems requiring repetitive steps we may find equivalent iterative and recursive algorithms as solutions.
What is recursion? A recursion may be described as the process of executing the steps in a recursive algorithm. So what is recursive? We sometimes tell our students, “If you look up ‘recursive’ in the dictionary, its definition is ‘see recursive.’” We deduce from this anecdotal definition that a recursive algorithm is defined in terms of itself. The actual definition found in one dictionary, “pertaining to or using a rule or procedure that can be applied repeatedly,” is not very helpful.
In developing an understanding for recursion we rely on its use in mathematics, algorithms, and computer programming. From mathematics we find recursive functions defined in terms of themselves.
This is a CS 2 book that presents classical data structures in an object-oriented programming (OOP) context using Java. This book also focuses on the basic principles of OOP and graphical user interface (GUI)-based programming – two paradigms essential for modern programming and problem solving. Our book is aimed principally at CS 2 students but may also be valuable to software development professionals who wish to upgrade their skills in the areas of OOP, GUI programming, and classical data structures.
The software development principles associated with OOP provide a strong framework for presenting and implementing classical data structures. We adhere to and emphasize these principles throughout this book.
Universities have been slow to introduce courses related to OOP into their curricula. Curriculum change has always occurred slowly at universities, but the past dozen years have been particularly disappointing in the area of OOP education. Often a department assumes that because it has switched language from Pascal or C to C++ or Java in CS 1 or CS 2 that it has made a commitment to object-oriented software education. This is simply not true. Object orientation embodies a set of principles often obscured by the intensive preoccupation with language details often evident in early university courses and the books that cater to these courses. The spate of CS 1 and CS 2 books featuring C++ or Java are often nothing more than warmed-over reruns of structured programming texts written originally for Pascal or C.
Sorting involves rearranging information in some container, usually an array, so that the information is stored from smallest to largest (ascending order) or from largest to smallest (descending order). The need to sort is fundamental. We are interested in finding efficient algorithms to accomplish the task.
We shall assume throughout this chapter that the entities to be sorted are Comparable. That is, they may be compared using the query compareTo.
All the sorting methods are presented as static functions with an array of Comparable as the first parameter and the number of elements to be sorted as the second parameter. Although this represents a departure from the normal pattern of object-oriented class construction, we believe it is justified. As long as the array of elements to be sorted are Comparable the user should not be burdened with having to create an instance of a sorting class in order to rearrange the elements in the array that requires sorting.
Simple and Inefficient Sorting Algorithms
We consider two relatively simple sorting algorithms in this section before turning our attention to more efficient sorting.
Selection Sort
The array is scanned from index 1 to index n and the location of the largest value is obtained. This value is interchanged with the nth value. This assures that the largest value is placed in the rightmost position (index n).
The array is again scanned, this time from index 1 to index n – 1. The location of the largest value is obtained.
Object-oriented software development is centered on the construction of classes. Classes represent a model of the application domain. Object-oriented software analysis and design are preoccupied with the discovery of classes and the relationships they have to each other. Through composition – in which one class holds one or more objects from other classes – and inheritance, the architecture of a software system is denned. This architecture is ultimately realized at the implementation phase by the construction and definition of classes.
This chapter closely examines the issues related to class construction using Java. Among the important issues to be discussed are:
What responsibilities should be vested within a class?
What responsibilities should be vested with the user of a class?
How can we bind the user's responsibilities with the class's responsibilities?
How can we organize the behavior of a class in a systematic manner?
What naming conventions and documentation style should be employed in class construction?
How can and should one control the visibility and access to various features of a class?
Responsibilities between a Class and Its Users – Design by Contract
Bertrand Meyer, perhaps more than any other writer, has clarified and influenced our thinking regarding the responsibilities between a class and its users. His ideas are contained in his seminal work Object-Oriented Software Construction, Second Edition (Prentice-Hall, 1997) and manifested in the Eiffel programming language and environment.
We examine two important types of relationships between classes in this chapter – namely, composition and inheritance. We illustrate the concepts by constructing a complete software system in Java that illustrates the use of these two types of relationships.
Inheritance
Inheritance, as the name implies, involves the transmittal of behavioral characteristics from parent class to child class. Through inheritance one can establish behavior in a base class that is available and directly usable in a hierarchy of descendent classes that extend the base class.
As discussed in Chapter 1, inheritance can be centered on factoring and reusing methods (implementation inheritance) or on extending behavior (behavioral inheritance). It is the latter that we shall utilize in this chapter and throughout this book.
With behavioral inheritance, it is essential that any child class logically be of the same type as its parent. As you recall from Chapter 1, the principle of polymorphic substitution allows a descendent class to be used in place of its ancestor. This would make sense only if each child class can logically be considered to be a kind of its parent.
A child class may extend a parent class by introducing one or more fields or methods not found in the parent or by redefining one or more parent class methods.
One of the conceptual pillars supporting object-oriented software development is the abstract data type (ADT). David Parnas and others articulated this concept in the 1960s. For many years this concept has formed the basis for software construction, both object oriented and otherwise. All of the data structures to be presented in this book are formulated as abstract data types.
A data type is a program entity holding information that can be manipulated in a disciplined manner through a set of predefined operations. Predefined operations include commands that may be used to modify the value of the data type and queries that may be used to access the value of the data type. In the Java programming language an abstract data type is implemented using the class construct. The information structure (data structure) of the ADT is represented in the internal (usually private or protected) fields of the class. The commands are represented by methods that return type void. The queries are represented either by public fields or methods that return a nonvoid type representing field information.
Many software developers have found that ADTs aid in formulating clear and clean software architecture and promote greater understandability of the software and easier software maintenance. In structured programming languages such as C and Pascal the programmer must impose strict protocols in order to utilize ADTs. In the early 1980s two pre-object-oriented languages, Ada and Modula-2, were specifically designed to support and encourage the use of ADTs.
Binary trees were introduced in the previous chapter. A binary tree holds the generic Object type that serves as a placeholder for any reference type. This combined with its nonlinear structure makes it suitable for representing a diversity of information.
This chapter focuses on a specialized but extremely important tree type – the search tree. Such a binary tree holds elements of type Comparable. That is, the elements stored in a search tree may be compared to one another by answering to the query compareTo. The goal of a search table is to provide efficient access to information while allowing the information to be output in an ordered sequence. The order of elements in a binary search tree is based on a comparable property of the elements themselves.
In Chapter 13 we examined the OrderedList as a concrete implementation of a SearchTable. Here we shall examine three concrete search tree classes, each providing an implementation of the interface SearchTable. These concrete classes are BinarySearchTree, AVLTree, and SplayTree. In addition, we shall investigate another interesting and recent implementation of SearchTable, given by class SkipList.
Review of Search Table Abstraction
Recall from Chapter 10 that a search table is a compact abstraction that extends Container and provides the commands add and remove in addition to the command makeEmpty in class Container. The queries contains, get, and iterator are provided by SearchTable in addition to the queries isEmpty and size inherited from Container.
The principles and practices of object-oriented software construction have evolved since the 1960s. Object-oriented programming (OOP) is preoccupied with the manipulation of software objects. OOP is a way of thinking about problem solving and a method of software organization and construction.
The concepts and ideas associated with object-oriented programming originated in Norway in the 1960s. A programming language called Simula developed by Christian Nygaard and his associates at the University of Oslo is considered the first object-oriented language. This language inspired significant thinking and development work at Xerox PARC (Palo Alto Research Center) in the 1970s that eventually led to the simple, rich, and powerful Smalltalk-80 programming language and environment (released in 1980). Smalltalk, perhaps more than any programming language before or after it, laid the foundation for object-oriented thinking and software construction. Smalltalk is considered a “pure” object-oriented language. Actions can be invoked only through objects or classes (a class can be considered an object in Smalltalk). The simple idea of sending messages to objects and using this as the basis for software organization is directly attributable to Smalltalk.
Seminal work on object-oriented programming was done in the mid-1980s in connection with the Eiffel language. Bertrand Meyer in his classic book Object-Oriented Software Construction (Prentice-Hall, 1988; Second Edition, 1997) set forth subtle principles associated with OOP that are still viable and alive today.
We are all familiar with the concept of a dictionary as a fairly large book containing words and definitions. The words are always in alphabetical order to help us look up a particular word. Having the words in alphabetical order is a convenient feature but is not required. There may be other ways to find words in the dictionary, especially if our dictionary is in electronic form. Most words in a dictionary have several definitions. We associate each word with its definitions. Thus, we may characterize a dictionary as a container (possibly ordered) of associations between words and their meanings.
To take our reasoning a step further in our attempt to understand the required behavior of a dictionary, we never add definitions to a dictionary unless they are associated with a word. On the other hand, as we are building the dictionary we may add words without definitions on the promise that the definitions will be added later for those words. And finally, as we fine tune our understanding, we may change definitions for words that are already in the dictionary. We may wish to remove entries in the dictionary or access them in various ways. For example, we may wish to access a list of the words only, the meanings only, or the entire list of entries.
In Chapter 10 we defined interface Dictionary as an extension of the Container interface, interface OrderedDictionary as an extension of SearchTable, and supporting class Association.
In a world with perfect users, perfect programs, and perfect hardware we would not have to concern ourselves with exceptions. Users would never enter incorrect data (e.g., enter an alphabetic character when a number is required). Hardware would never fail. Printers would always be on when our software attempts to access them. A hard drive would never be full when a program attempts to write to it. Such a world does not exist.
Brittle programs do indeed crash because of some failure in the input, the program logic, or the physical system supporting the application. Some crashes occur because of hardware interrupts (failures in a hardware component) or synchronization conflict (two or more segments of code attempting to modify the same data simultaneously). Program crashes may be catastrophic. If the rudder computer control system on an aircraft were to go down or the computer guidance system on a rocket were to fail, a catastrophe of major proportions might occur. Often a program crash results in a loss of input data – not a catastrophe but a profound annoyance to the user. One's confidence in using a program often disappears if such loss of data occurs frequently.
Exception handling involves defensive programming tactics that ensure a more benign outcome if your programming application should fail. Exception handling can ensure that input data are saved before a program is terminated, can notify the user that an input or hardware error has occurred and allow program execution to continue, or can bring the system to a stable and safe state before exiting the application.