To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Up to this point we’ve focused on introducing the Java language and – starting with the previous chapter – the technique of algorithm analysis. We’re now ready to put that machinery to work by describing our first important new data structure: lists. A list is like an array, in that it represents an ordered sequence of data values, but lists are more flexible: They support operations for dynamically inserting and removing data as the program executes. We’ve already used Java’s built-in ArrayList class to manage a collection of values; now we’re ready to talk about how it’s implemented internally.
The last chapter ended on a down note, when we realized that the standard binary search tree can’t guarantee O(log n) performance if it isn’t balanced. This chapter introduces self-balancing search trees. All three of the trees we’ll examine – 2-3-4 trees, B-trees, and red–black trees – implement search tree operations, but perform extra work to ensure that the tree stays balanced.
Arrays are Java’s fundamental low-level data structure, used to manage fixed-size collections of items. Chapter 2 introduced ArrayList, which implemented a resizable sequential collection of data items, similar to Python’s lists. Arrays are lower-level, but they’re often the best choice for representing fixed-size collections of items, such as matrices. Arrays are also the basic building block of many higher-level data structures. Therefore, understanding how to create and manipulate basic arrays is an essential skill.
Recursion is a fundamental concept in computer science. A recursive algorithm is one that defines a solution to a problem in terms of itself. That is, recursive techniques solve large problems by building up solutions of smaller instances of the same problem. This turns out to be a powerful technique, because many advanced algorithmic problems and data structures are fundamentally self-similar.
Pinterest is a social media platform that allows users to assemble images or other media into customized lists, then share those lists with others. Pinterest calls these lists “pinboards” and the items added to each board “pins,” analogous to real-world physical bulletin boards. Like other social media systems, Pinterest wants to recommend new content to its users to keep them engaged with the service. In 2018, Pinterest introduced a system called Pixie as a component of their overall recommendation infrastructure (Eksombatchai et al., 2018). It uses a graph model to represent the connections among items, then explores that graph in a randomized way to generate recommendations. In this chapter, we’ll build our own system based on the graph algorithms used by Pixie.
We live in a networked world. Professional networks, social networks, neural networks – we’re all familiar with the idea that connections matter. This chapter introduces graphs, our last major topic. Graphs are the primary tool for modeling connections or relationships among a set of items; binary trees, for example, are a special type of graph. Graph models illustrate the power of abstraction: They capture the underlying structure of a network, independent of what the elements actually represent. Therefore, graph algorithms are flexible – they’re not tied to one particular application or problem domain.
So far, we’ve considered four data structures: arrays, lists, stacks, and queues. All four could be described as linear, in that they maintain their items as ordered sequences: arrays and lists are indexed by position, stacks are LIFO, and queues are FIFO. In this chapter, we’ll consider the new problem of building a lookup structure, like a table, that can take an input called the key and return its associated value. For example, we might fetch a record of information about a museum artifact given its ID number as the key. None of our previous data structures are a good fit for this problem.
Logic Theorist was the first artificially intelligent program, created in 1955 by Allen Newell and Herbert Simon, and actually predating the term “artificial intelligence,” which was introduced the next year. Logic Theorist could apply the rules of symbolic logic to prove mathematical theorems – the first time a computer accomplished a task considered solely within the domain of human intelligence. Given a starting statement, it applied logical laws to generate a set of new statements, then recursively continued the process. Eventually, this procedure would discover a chain of logical transformations that connected the starting statement to the desired final statement. Applied naively, this process would generate an intractable number of possible paths, but Logic Theorist had the ability to detect and discard infeasible paths that couldn’t lead to a solution.
Very often, software developers need to evaluate the trade-offs between different approaches to solving a problem. Do you want the fastest solution, even if it’s difficult to implement and maintain? Will your code still be useful if you have to process 100 times as much data? What if an algorithm is fast for some inputs but terrible for others? Algorithm analysis is the framework that computer scientists use to understand the trade-offs between algorithms. Algorithm analysis is primarily theoretical: It focuses on the fundamental properties of algorithms, and not on systems, languages, or any particular details of their implementations.
This chapter introduces the key concepts of algorithm analysis, starting from the practical example of searching an array for a value of interest. We’ll start by making experimental comparisons between two searching methods: a simple linear search and the more complex binary search. The second part of the chapter introduces one of the most important mathematical tools in computer science, Big-O notation, the primary tool for algorithm analysis.
No other computational problem has been studied in more depth, or yielded a greater number of useful solutions, than sorting. Historically, business computers spent 25% of their time doing nothing but sorting data (Knuth, 2014c), and many advanced algorithms start by sorting their inputs. Dozens of algorithms have been proposed over the last 80-odd years, but there is no “best” solution to the sorting problem. Although many popular sorting algorithms were known as early as the 1940s, researchers are still designing improved versions – Python’s default algorithm was only implemented in the early 2000s and Java’s current version in the 2010s.
Computer animators have always sought to push boundaries and create impressive, realistic visual effects, but some processes are too demanding to model exactly. Effects like fire, smoke, and water have complex fluid dynamics and amorphous boundaries that are hard to recreate with standard physical calculations. Instead, animators might turn to another approach to create these effects: particle systems. Bill Reeves, a graphics researcher and animator, began experimenting with particle-based effects in the early 1980s while making movies at Lucasfilm. For a scene in Star Trek II: The Wrath of Khan (1982), he needed to create an image of explosive fire spreading across the entire surface of a planet. Reeves used thousands of independent particles, each one representing a tiny piece of fire (Reeves, 1983). The fire particles were created semi-randomly, with attributes for their 3D positions, velocities, and colors. Reeves’ model governed how particles appeared, moved, and interacted to create a realistic effect that could be rendered on an early 1980s computer. Reeves would go on to work on other Lucasfilm productions, including Return of the Jedi (1983), before joining Pixar, where his credits include Toy Story (1995) and Finding Nemo (2003).
Java is an object-oriented programming language. Java programs are implemented as collections of classes and objects that interact with each other to deliver the functionality that the programmer wants. So far, we’ve used “class” as being roughly synonymous with “program,” and all of our programs have consisted of one public class with a main method that may call additional methods. We’ve also talked about how to use the new keyword to initialize objects like Scanner that can perform useful work. It’s now time to talk about the concepts of objects and classes in more depth and then learn how to write customized classes.
The previous two chapters showed how the concept of last-in-first-out data processing is surprisingly powerful. We’ll now consider the stack’s counterpart, the queue. Like a waiting line, a queue stores a set of items and returns them in first-in-first-out (FIFO) order. Pushing to the queue adds a new item to the back of the line and pulling retrieves the oldest item from the front. Queues have a lower profile than stacks, and are rarely the centerpiece of an algorithm. Instead, queues tend to serve as utility data structures in a larger system.
A hash, in culinary terms, is a dish made of mixed foods – often including corned beef and onions – chopped into tiny pieces. In the early twentieth century, it became a shorthand for something of dubious origin, probably unwise to consume. In computer science, a hash function is an operation that rearranges, mixes, and combines data to produce a single fixed-size output. Unlike their culinary namesake, hash functions are wonderfully useful. A hash value is like a “fingerprint” of the input used to calculate it. Hash functions have applications to security, distributed systems, and – as we’ll explore – data structures.
There’s a perception that computer science is constantly changing, and in some respects that’s true: There are always new languages, frameworks, and application domains rising up and old ones sinking down. All of this change that we see around us, though, is like the top part of an iceberg. The most visible elements of our field are built upon and supported by a deeper layer of knowledge that’s mostly invisible to the casual observer. This book is about what’s under the water, the fundamental things that make programming possible, even if we don’t see them right away.
Computers have always mixed with art and music. Even in the earliest days of computing, when machines were the size of entire rooms, artists and composers began to harness them to create original works that could only exist in the digital realm. “Generative art” or “algorithmic art” is a term for works created according to a process that evolves with no or limited guidance from a human creator. Rather than directly making choices, the artist instead focuses on the design and initialization of a system that produces the final work. The appeal of algorithmic art lies in its combination of detail, technical complexity, and variation. Generative art frequently incorporates ideas from biology, physics, and mathematics.