To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Searching for data is a fundamental computer programming task and one that has been studied for many years. This chapter looks at just one aspect of the search problem—searching for a given value in a list (array).
There are two fundamental ways to search for data in a list: the sequential search and the binary search. Sequential search is used when the items in the list are in random order; binary search is used when the items are sorted in the list.
SEQUENTIAL SEARCHING
The most obvious type of search is to begin at the beginning of a set of records and move through each record until you find the record you are looking for or you come to the end of the records. This is called a sequential search.
A sequential search (also called a linear search) is very easy to implement. Start at the beginning of the array and compare each accessed array element to the value you're searching for. If you find a match, the search is over. If you get to the end of the array without generating a match, then the value is not in the array.
Whereas the String and StringBuilder classes provide a set of methods that can be used to process string-based data, the RegEx and its supporting classes provide much more power for string-processing tasks. String processing mostly involves looking for patterns in strings (pattern matching) and it is performed via a special language called a regular expression. In this chapter, we look at how to form regular expressions and how to use them to solve common text processing tasks.
AN INTRODUCTION TO REGULAR EXPRESSIONS
A regular expression is a language that describes patterns of characters in strings, along with descriptors for repeating characters, alternatives, and groupings of characters. Regular expressions can be used to perform both searches in strings and substitutions in strings.
A regular expression itself is just a string of characters that define a pattern you want to search for in another string. Generally, the characters in a regular expression match themselves, so that the regular expression “the” matches that sequence of characters wherever they are found in a string.
A regular expression can also include special characters that are called metacharacters. Metacharacters are used to signify repetition, alternation, or grouping. We will examine how these metacharacters are used shortly.
Most experienced computer users have used regular expressions in their work, even if they weren't aware they were doing so at the time.
Data organize naturally as lists. We have already used the Array and ArrayList classes for handling data organized as a list. Although those data structures helped us group the data in a convenient form for processing, neither structure provides a real abstraction for actually designing and implementing problem solutions.
Two list-oriented data structures that provide easy-to-understand abstractions are stacks and queues. Data in a stack are added and removed from only one end of the list, whereas data in a queue are added at one end and removed from the other end of a list. Stacks are used extensively in programming language implementations, from everything from expression evaluation to handling function calls. Queues are used to prioritize operating system processes and to simulate events in the real world, such as teller lines at banks and the operation of elevators in buildings.
C# provides two classes for using these data structures: the Stack class and the Queue class. We'll discuss how to use these classes and look at some practical examples in this chapter.
STACKS, A STACK IMPLEMENTATION AND THE STACK CLASS
The stack is one of the most frequently used data structures, as we just mentioned. We define a stack as a list of items that are accessible only from the end of the list, which is called the top of the stack. The standard model for a stack is the stack of trays at a cafeteria.
For many applications, data are best stored as lists, and lists occur naturally in day-to-day life: to-do lists, grocery lists, and top-ten lists. In this chapter, we explore one particular type of list, the linked list. Although the .NET Framework class library contains several list-based collection classes, the linked list is not among them. The chapter starts with an explanation of why we need linked lists, then we explore two different implementations of the data structure—object-based linked lists and array-based linked lists. The chapter finishes up with several examples of how linked lists can be used for solving computer programming problems you may run across.
THE PROBLEM WITH ARRAYS
The array is the natural data structure to use when working with lists. Arrays provide fast access to stored items and are easy to loop through. And, of course, the array is already part of the language and you don't have to use extra memory and processing time using a user-defined data structure.
But as we've seen, the array is not the perfect data structure. Searching for an item in an unordered array is slow because you have to possibly visit every element in the array before finding the element you're searching for. Ordered (sorted) arrays are much more efficient for searching, but insertions and deletions are slow because you have to shift the elements up or down to either make space for an insertion or remove space with a deletion.
The BitArray class is used to represent sets of bits in a compact fashion. Bit sets can be stored in regular arrays, but we can create more efficient programs if we use data structures specifically designed for bit sets. In this chapter, we'll look at how to use this data structure and examine some problems that can be solved using sets of bits. The chapter also includes a review of the binary numbers, the bitwise operators, and the bitshift operators.
A MOTIVATING PROBLEM
Let's look at a problem we will eventually solve using the BitArray class. The problem involves finding prime numbers. An ancient method, discovered by the third-century b.c. Greek philosopher Eratosthenes, is called the sieve of Eratosthenes. This method involves filtering numbers that are multiples of other numbers, until the only numbers left are primes. For example, let's determine the prime numbers in the set of the first 100 integers. We start with 2, which is the first prime. We move through the set removing all numbers that are multiples of 2. Then we move to 3, which is the next prime. We move through the set again, removing all numbers that are multiples of 3. Then we move to 5, and so on. When we are finished, all that will be left are prime numbers.
A set is a collection of unique elements. The elements of a set are called members. The two most important properties of sets are that the members of a set are unordered and no member can occur in a set more than once. Sets play a very important role in computer science but are not included as a data structure in C#.
This chapter discusses the development of a Set class. Rather than providing just one implementation, however, we provide two. For nonnumeric items, we provide a fairly simple implementation using a hash table as the underlying data store. The problem with this implementation is its efficiency. A more efficient Set class for numeric values utilizes a bit array as its data store. This forms the basis of our second implementation.
FUNDAMENTAL SET DEFINITIONS, OPERATIONS AND PROPERTIES
A set is defined as an unordered collection of related members in which no member occurs more than once. A set is written as a list of members surrounded by curly braces, such as {0,1,2,3,4,5,6,7,8,9}. We can write a set in any order, so the previous set can be written as {9,8,7,6,5,4,3,2,1,0} or any other combination of the members so that all members are written just once.
The array is the most common data structure, present in nearly all programming languages. Using an array in C# involves creating an array object of System.Array type, the abstract base type for all arrays. The Array class provides a set of methods for performing tasks such as sorting and searching that programmers had to build by hand in the past.
An interesting alternative to using arrays in C# is the ArrayList class. An arraylist is an array that grows dynamically as more space is needed. For situations where you can't accurately determine the ultimate size of an array, or where the size of the array will change quite a bit over the lifetime of a program, an arraylist may be a better choice than an array.
In this chapter, we'll quickly touch on the basics of using arrays in C#, then move on to more advanced topics, including copying, cloning, testing for equality and using the static methods of the Array and ArrayList classes.
ARRAY BASICS
Arrays are indexed collections of data. The data can be of either a built-in type or a user-defined type. In fact, it is probably the simplest just to say that array data are objects. Arrays in C# are actually objects themselves because they derive from the System.Array class. Since an array is a declared instance of the System.Array class, you have the use of all the methods and properties of this class when using arrays.
Hashing is a very common technique for storing data in such a way the data can be inserted and retrieved very quickly. Hashing uses a data structure called a hash table. Although hash tables provide fast insertion, deletion, and retrieval, operations that involve searching, such as finding the minimum or maximum value, are not performed very quickly. For these types of operations, other data structures are preferred (see, for example, Chapter 12 on binary search trees).
The .NET Framework library provides a very useful class for working with hash tables, the Hashtable class. We will examine this class in the chapter, but we will also discuss how to implement a custom hash table. Building hash tables is not very difficult and the programming techniques used are well worth knowing.
AN OVERVIEW OF HASHING
A hash table data structure is designed around an array. The array consists of elements 0 through some predetermined size, though we can increase the size later if necessary. Each data item is stored in the array based on some piece of the data, called the key. To store an element in the hash table, the key is mapped into a number in the range of 0 to the hash table size using a function called a hash function.
The two most common operations performed on data stored in a computer are sorting and searching. This has been true since the beginning of the computing industry, which means that sorting and searching are also two of the most studied operations in computer science. Many of the data structures discussed in this book are designed primarily to make sorting and/or searching easier and more efficient on the data stored in the structure.
This chapter introduces you to the fundamental algorithms for sorting and searching data. These algorithms depend on only the array as a data structure and the only “advanced” programming technique used is recursion. This chapter also introduces you to the techniques we'll use throughout the book to informally analyze different algorithms for speed and efficiency.
SORTING ALGORITHMS
Most of the data we work with in our day-to-day lives is sorted. We look up definitions in a dictionary by searching alphabetically. We look up a phone number by moving through the last names in the book alphabetically. The post office sorts mail in several ways—by zip code, then by street address, and then by name. Sorting is a fundamental process in working with data and deserves close study.
As was mentioned earlier, there has been quite a bit of research performed on different sorting techniques. Although some very sophisticated sorting algorithms have been developed, there are also several simple sorting algorithms you should study first. These sorting algorithms are the insertion sort, the bubble sort, and the selection sort.
This book discusses the development and implementation of data structures and algorithms using C#. The data structures we use in this book are found in the .NET Framework class library System.Collections. In this chapter, we develop the concept of a collection by first discussing the implementation of our own Collection class (using the array as the basis of our implementation) and then by covering the Collection classes in the .NET Framework.
An important addition to C# 2.0 is generics. Generics allow the C# programmer to write one version of a function, either independently or within a class, without having to overload the function many times to allow for different data types. C# 2.0 provides a special library, System.Collections.Generic, that implements generics for several of the System.Collections data structures. This chapter will introduce the reader to generic programming.
Finally, this chapter introduces a custom-built class, the Timing class, which we will use in several chapters to measure the performance of a data structure and/or algorithm. This class will take the place of Big O analysis, not because Big O analysis isn't important, but because this book takes a more practical approach to the study of data structures and algorithms.
COLLECTIONS DEFINED
A collection is a structured data type that stores data and provides operations for adding data to the collection, removing data from the collection, updating data in the collection, as well as operations for setting and returning the values of different attributes of the collection.
In this chapter, we look at two advanced topics: dynamic programming and greedy algorithms. Dynamic programming is a technique that is often considered to be the reverse of recursion—a recursive solution starts at the top and breaks the problem down solving all small problems until the complete problem is solved; a dynamic programming solution starts at the bottom, solving small problems and combining them to form an overall solution to the big problem.
A greedy algorithm is an algorithm that looks for “good solutions” as it works toward the complete solution. These good solutions, called local optima, will hopefully lead to the correct final solution, called the global optimum. The term “greedy” comes from the fact these algorithms take whatever solution looks best at the time. Often, greedy algorithms are used when it is almost impossible to find a complete solution, due to time and/or space considerations, yet a suboptimal solution is acceptable.
DYNAMIC PROGRAMMING
Recursive solutions to problems are often elegant but inefficient. The C# compiler, along with other language compilers, will not efficiently translate the recursive code to machine code, resulting in an inefficient, though elegant computer program.
Many programming problems that have recursive solutions can be rewritten using the techniques of dynamic programming. A dynamic programming solution builds a table, usually using an array, which holds the results of the different subsolutions. Finally, when the algorithm is complete, the solution is found in a distinct spot in the table.
In this chapter, we present a set of advanced data structures and algorithms for performing searching. The data structures we cover include the red–black tree, the splay tree, and the skip list. AVL trees and red–black trees are two solutions to the problem of handling unbalanced binary search trees. The skip list is an alternative to using a tree-like data structure that foregoes the complexity of the red–black and splay trees.
AVL TREES
Another solution to maintaining balanced binary trees is the AVL tree. The name AVL comes from the two computer scientists who discovered this data structure, G. M. Adelson-Velskii and E. M. Landis, in 1962. The defining characteristic of an AVL tree is that the difference between the height of the right and left subtrees can never be more than one.
AVL Tree Fundamentals
By continually comparing the heights of the left and right subtrees of a tree, the AVL tree is guaranteed to always stay “in balance.” AVL trees utilize a technique, called a rotation, to keep them in balance.
To understand how a rotation works, let's look at a simple example that builds a binary tree of integers. Starting with the tree shown in Figure 15.1, if we insert the value 10 into the tree, the tree becomes unbalanced, as shown in Figure 15.2. The left subtree now has a height of 2, but the right subtree has a height of 0, violating the rule for AVL trees.
Although graphic representations have proven to be of value in computer-aided support and have received much attention in both research and practice (Goldschmidt, 1991; Goel, 1995; Achten, 1997; Do, 2002), linguistic representations presently do not significantly contribute to improve the information handling related to the computer support of a design product. During its life cycle, engineers and designers make many representations of a product. The information and knowledge used to create the product are usually represented visually in sketches, models, (technical) drawings, and images. Linguistic information is complementary to graphic information and essential to create the corporate memory of products. Linguistic information (i.e., the use of words, abbreviations, vocal comments, annotations, notes, and reports) creates meaningful information for designers and engineers as well as for computers (Segers, 2004; Juchmes et al., 2005). Captions, plain text, and keyword indexing are now common to support the communication between design actors (Lawson & Loke, 1997; Wong & Kvan, 1999; Heylighen, 2001; Boujut, 2003). Nevertheless, it is currently scarcely used to its full potential in design, maintenance, and manufacturing.
This paper examines the use of language, specifically verbs, as stimuli for concept generation. Because language has been shown to be important to the reasoning process in general as well as to specific reasoning processes that are central to the design process, we are investigating the relationship between language and conceptual design. The use of language to facilitate different stages of the design process has been investigated in the past. Our previous work, and the work of others, showed that ideas produced can be expressed through related hierarchical lexical relationships, so we investigated the use of verbs within these hierarchical relationships as stimuli for ideas. Participants were provided with four problems and related verb stimuli, and asked to develop concepts using the stimuli provided. The stimuli sets were generated by exploring verb hierarchies based on functional words from the problem statements. We found that participants were most successful when using lower level (more specific) verbs as stimuli, and often higher level general verbs were only used successfully in conjunction with lower level verbs. We also observed that intransitive verbs (verbs that cannot take a direct object) were less likely to be used successfully in the development of concepts. Overall, we found that the verb chosen as stimulus by the participant directly affects the success and the type of concept developed.
The paper introduces a family of three-DOFs translational-rotational Parallel-Kinematics Mechanisms (PKMs) as well as the mobility analysis of such family using Lie-group theory. Each member of this family has two-rotational one-translational DOFs. A novel mechanism is presented and analyzed as a representative of that family. The use and the practical value of that modular mechanism are emphasized.
A flexible information model for systematic development and deployment of product families during all phases of the product realization process is crucial for product-oriented organizations. In current practice, information captured while designing products in a family is often incomplete, unstructured, and is mostly proprietary in nature, making it difficult to index, search, refine, reuse, distribute, browse, aggregate, and analyze knowledge across heterogeneous organizational information systems. To this end, we propose a flexible knowledge management framework to capture, reorganize, and convert both linguistic and parametric product family design information into a unified network, which is called a networked bill of material (NBOM) using formal concept analysis (FCA); encode the NBOM as a cyclic, labeled graph using the Web Ontology Language (OWL) that designers can use to explore, search, and aggregate design information across different phases of product design as well as across multiple products in a product family; and analyze the set of products in a product family based on both linguistic and parametric information. As part of the knowledge management framework, a PostgreSQL database schema has been formulated to serve as a central design repository of product design knowledge, capable of housing the instances of the NBOM. Ontologies encoding the NBOM are utilized as a metalayer in the database schema to connect the design artifacts as part of a graph structure. Representing product families by preconceived common ontologies shows promise in promoting component sharing, and assisting designers search, explore, and analyze linguistic and parametric product family design information. An example involving a family of seven one-time-use cameras with different functions that satisfy a variety of customer needs is presented to demonstrate the implementation of the proposed framework.
Because of the increasing complexity of products and the design process, as well as the popularity of computer-aided documentation tools, the number of electronic and textual design documents being generated has exploded. The availability of such extensive document resources has created new challenges and opportunities for research. These include improving design information retrieval to achieve a more coherent environment for design exploration, learning, and reuse. One critical issue is related to the construction of a structured representation for indexing design documents that record engineers' ideas and reasoning processes for a specific design. This representation should explicitly and accurately capture the important design concepts as well as the relationships between these concepts so that engineers can locate their documents of interest with less effort. For design information retrieval, we propose to use shallow natural language processing and domain-specific design ontology to automatically construct a structured and semantics-based representation from unstructured design documents. The design concepts and relationships of the representation are recognized from the document based on the identified linguistic patterns. The recognized concepts and relationships are joined to form a concept graph. The integration of these concept graphs builds an application-specific design ontology, which can be seen as the structured representation of the content of the corporate document repository, as well as an automatically populated knowledge base from previous designs. To improve the performance of design information retrieval, we have developed ontology-based query processing, where users' requests are interpreted based on their domain-specific meanings. Our approach contrasts with the traditionally used keyword-based search. An experiment to test the retrieval performance is conducted by using the design documents from a product design scenario. The results demonstrate that our method outperforms the keyword-based search techniques. This research contributes to the development and use of engineering ontology for design information retrieval.