We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Introduction to autonomous cooperating logistic processes and handling systems
During the last few decades the structural and dynamic complexity in logistics and production has increased steadily [1]. Many causes for higher structural complexity can be found, for instance, in the integration of multiple companies in production and logistic networks. This effect is furthermore amplified by a growing internal and dynamic complexity caused, for example, by an increasing number of product variants. Likewise, dynamic customer behavior intensifies this situation [2]. All these effects combined lead to higher information requirements.
For efficient planning and control a broad and reliable basis of information is needed [3]. However, the underlying algorithms will soon face the end of computation capacity due to the large amount of information that has to be taken into account. It is foreseeable that in the future centralized planning and control methods will not be able to process all the information delivered. A solution to this dilemma is the decentralized storage of necessary information on the logistic object itself as well as the capability of local decision-making. In order to achieve this goal, logistic objects themselves have to become intelligent.
The emergence of these intelligent objects is the foundation for autonomous cooperating logistic processes [4]. The main idea of this concept is to develop decentralized and heterarchical planning and control methods as opposed to existing centralized and hierarchical planning and control approaches. It requires that interacting elements in non-predictable systems possess the ability and the possibility to render decisions independently.
The principal investigators represented in RFID Technology and Applications have presented a range of practical approaches and models to consider in preparing an RFID implementation strategy as well as for planning future research areas for use of this fast-growing technology. As Sanjay Sarma states in presenting applications for RFID (Ch. 2), passive RFID technology is still in its infancy. We have identified challenging aspects of linking autonomous agents and intelligent handling systems (Ch. 14) with large volumes of distributed data sources such as RFID (Ch. 7). Using a consistent experimental approach based on control theory, test and simulation frameworks have been proposed to help evaluate RFID performance, for applications that start with uniquely identifying products and can include exchanging this information, together with sensor and real-time location data, across enterprises.
By way of summary, three areas emerge from these research initiatives that warrant careful planning, starting with the challenges of using low-power wireless data acquisition technology, with respect both to electromagnetic performance of tags in relationship to specific products and packaging, and to downstream RF environments in which the tags are to be read. These issues have been explored in the technology section (Chs. 2–8).
A second theme is the requirement to extend visibility over entire product lifecycles, an increasingly recurrent topic as governments seek new ways to gain visibility on globalizing supply chains.
In transitioning from the analysis of RFID systems technology to actual deployment considerations of specific sites, the first area confronting project managers is the need to assess the range of other RF applications that may be operating at the same unlicensed frequencies.
A critical part of preparing for RFID system implementations is planning to support the various RFID applications infrastructures that may be implemented. Specifically, the opportunity to leverage a common IT infrastructure for supply chain, pedigree-tracking, and location-tracking RFID services promise higher return from investments in these technologies. One example is to consider levera-ging a wireless data (WiFi) network as the back-haul for RFID readers. Another would be to analyze the costs and benefits of combining RTLS applications with the same WiFi infrastructure, rather than using stand-alone systems for both. At a minimum, a gap analysis is required, in order to assess systems operating across the unlicensed spectrum that might interfere with the performance of RFID systems, or vice versa.
Introduction
Wal-Mart and the Department of Defense's announcement in 2003 mandating their suppliers to implement RFID tagging for all goods supplied to them caused a flurry of panic and speculation on how best to implement this requirement. Other agencies, including the FDA and corporations, have followed suit, with varying timelines and deployment plans. The area of passive RFID is more mature than that of active RFID with respect to the level of standardization and test data.
Mutual exclusion is a fundamental problem in distributed computing systems. Mutual exclusion ensures that concurrent access of processes to a shared resource or data is serialized, that is, executed in a mutually exclusive manner. Mutual exclusion in a distributed system states that only one process is allowed to execute the critical section (CS) at any given time. In a distributed system, shared variables (semaphores) or a local kernel cannot be used to implement mutual exclusion. Message passing is the sole means for implementing distributed mutual exclusion. The decision as to which process is allowed access to the CS next is arrived at by message passing, in which each process learns about the state of all other processes in some consistent way. The design of distributed mutual exclusion algorithms is complex because these algorithms have to deal with unpredictable message delays and incomplete knowledge of the system state. There are three basic approaches for implementing distributed mutual exclusion:
Token-based approach.
Non-token-based approach.
Quorum-based approach.
In the token-based approach, a unique token (also known as the PRIVILEGE message) is shared among the sites. A site is allowed to enter its CS if it possesses the token and it continues to hold the token until the execution of the CS is over. Mutual exclusion is ensured because the token is unique.
Distributed shared memory (DSM) is an abstraction provided to the programmer of a distributed system. It gives the impression of a single monolithic memory, as in traditional von Neumann architecture. Programmers access the data across the network using only read and write primitives, as they would in a uniprocessor system. Programmers do not have to deal with send and receive communication primitives and the ensuing complexity of dealing explicitly with synchronization and consistency in the messagepassing model. The DSM abstraction is illustrated in Figure 12.1. A part of each computer's memory is earmarked for shared space, and the remainder is private memory. To provide programmers with the illusion of a single shared address space, a memory mapping management layer is required to manage the shared virtual memory space.
DSM has the following advantages:
Communication across the network is achieved by the read/write abstraction that simplifies the task of programmers.
A single address space is provided, thereby providing the possibility of avoiding data movement across multiple address spaces, and simplifying passing-by-reference and passing complex data structures containing pointers.
If a block of data needs to be moved, the system can exploit locality of reference to reduce the communication overhead.
DSM is often cheaper than using dedicated multiprocessor systems, because it uses simpler software interfaces and off-the-shelf hardware.
There is no bottleneck presented by a single memory access bus.
The idea of self-stabilization in distributed computing was first proposed by Dijkstra in 1974. The concept of self-stabilization is that, regardless of its initial state, the system is guaranteed to converge to a legitimate state in a bounded amount of time by itself without any outside intervention. A non-self-stabilizing system may never reach a legitimate state or it may reach a legitimate state only temporarily. The main complication in designing a self-stabilizing distributed system is that nodes do not have a global memory that they can access instantaneoulsy. Each node must make decisions based on the local knowledge available to it and actions of all nodes must achieve a global ojective.
The definition of legitimate and illegitimate states depends on the particular application. Generally, all illegitimate states are defined to be those states which are not legitimate states. Dijkstra also gave an example of the concept of self-stabilization using a self-stabilizing token ring system. For any given token ring when there are multiple tokens or there is no token, then such global states are known as illegitimate states. When we consider a distributed system where a large number of systems are widely distributed and communicate with each other using message passing or shared memory approach, there is a possibility for these systems to go into an illegitimate state, for example, if a message is lost. The concept of self-stabilization can help us recover from such situations in distributed system.
In distributed processing systems, a problem is typically solved in a distributed manner with the cooperation of a number of processes. In such an environment, inferring if a distributed computation has ended is essential so that the results produced by the computation can be used. Also, in some applications, the problem to be solved is divided into many subproblems, and the execution of a subproblem cannot begin until the execution of the previous subproblem is complete. Hence, it is necessary to determine when the execution of a particular subproblem has ended so that the execution of the next subproblem may begin. Therefore, a fundamental problem in distributed systems is to determine if a distributed computation has terminated.
The detection of the termination of a distributed computation is non-trivial since no process has complete knowledge of the global state, and global time does not exist. A distributed computation is considered to be globally terminated if every process is locally terminated and there is no message in transit between any processes. A “locally terminated” state is a state in which a process has finished its computation and will not restart any action unless it receives a message. In the termination detection problem, a particular process (or all of the processes) must infer when the underlying computation has terminated.
When we are interested in inferring when the underlying computation has ended, a termination detection algorithm is used for this purpose.
A fundamental concern in building a secure distributed system is the authentication of local and remote entities in the system. In a distributed system, the hosts communicate by sending and receiving messages over the network. Various resources (such as files and printers) distributed among the hosts are shared across the network in the form of network services provided by servers. The entities in a distributed system, such as users, clients, servers, and processes, are collectively referred to as principals. A distributed system is susceptible to a variety of threats mounted by intruders as well as legitimate users of the system.
In an environment where a principal can impersonate another principal, principals must adopt a mutually suspicious attitude toward one another and authentication becomes an important requirement. Authentication is a process by which one principal verifies the identity of another principal. For example, in a client–server system, the server may need to authenticate the client. Likewise, the client may want to authenticate the server so that it is assured that it is talking to the right entity. Authentication is needed for both authorization and accounting functions. In one-way authentication, only one principal verifies the identity of the other principal, while in mutual authentication both communicating principals verify each other's identity. A user gains access to a distributed system by logging on to a host in the system.
A distributed system is a collection of independent entities that cooperate to solve a problem that cannot be individually solved. Distributed systems have been in existence since the start of the universe. From a school of fish to a flock of birds and entire ecosystems of microorganisms, there is communication among mobile intelligent agents in nature. With the widespread proliferation of the Internet and the emerging global village, the notion of distributed computing systems as a useful and widely deployed tool is becoming a reality. For computing systems, a distributed system has been characterized in one of several ways:
You know you are using one when the crash of a computer you have never heard of prevents you from doing work.
A collection of computers that do not share common memory or a common physical clock, that communicate by a messages passing over a communication network, and where each computer has its own memory and runs its own operating system. Typically the computers are semi-autonomous and are loosely coupled while they cooperate to address a problem collectively.
A collection of independent computers that appears to the users of the system as a single coherent computer.
A term that describes a wide range of computers, from weakly coupled systems such as wide-area networks, to strongly coupled systems such as local area networks, to very strongly coupled systems such as multiprocessor systems.
A distributed system consists of a set of processors that are connected by a communication network. The communication network provides the facility of information exchange among processors. The communication delay is finite but unpredictable. The processors do not share a common global memory and communicate solely by passing messages over the communication network. There is no physical global clock in the system to which processes have instantaneous access. The communication medium may deliver messages out of order, messages may be lost, garbled, or duplicated due to timeout and retransmission, processors may fail, and communication links may go down. The system can be modeled as a directed graph in which vertices represent the processes and edges represent unidirectional communication channels.
A distributed application runs as a collection of processes on a distributed system. This chapter presents a model of a distributed computation and introduces several terms, concepts, and notations that will be used in the subsequent chapters.
A distributed program
A distributed program is composed of a set of n asynchronous processes p1, p2, …, pi, …, pn that communicate by message passing over the communication network. Without loss of generality, we assume that each process is running on a different processor. The processes do not share a global memory and communicate solely by passing messages. Let Cij denote the channel from process pi to process pj and let mij denote a message sent by pi to pj. The communication delay is finite and unpredictable.
Specifying predicates on the system state provides an important handle to specify, observe, and detect the behavior of a system. This is useful in formally reasoning about the system behavior. By being able to detect a specified predicate in the execution, we gain the ability to monitor the execution. Predicate specification and detection has uses in distributed debugging, sensor networks used for sensing in various applications, and industrial process control. As an example in the manufacturing process, a system may be monitoring the pressure of Reagent A and the temperature of Reagent B. Only when ψ1 = (PressureA > 240 KPa) ∧ (TemperatureB > 300 °C) should the two reagents be mixed. As another example, consider a distributed execution where variables x, y, and z are local to processes Pi, Pj, and Pk, respectively. An application might be interested in detecting the predicate ψ2 = xi + yj + zk < −125. In a nuclear power plant, sensors at various locations would monitor the relevant parameters such as the radioactivity level and temperature at multiple locations within the reactor.
Observe that the “predicate detection” problem is inherently different from the global snapshot problem. A global snapshot gives one of the possible states that could have existed during the period of the snapshot execution. Thus, a snapshot algorithm can observe only one of the predicate values that could have existed during the algorithm execution.
Recording the global state of a distributed system on-the-fly is an important paradigm when one is interested in analyzing, testing, or verifying properties associated with distributed executions. Unfortunately, the lack of both a globally shared memory and a global clock in a distributed system, added to the fact that message transfer delays in these systems are finite but unpredictable, makes this problem non-trivial.
This chapter first defines consistent global states (also called consistent snapshots) and discusses issues which have to be addressed to compute consistent distributed snapshots. Then several algorithms to determine on-the-fly such snapshots are presented for several types of networks (according to the properties of their communication channels, namely, FIFO, non-FIFO, and causal delivery).
Introduction
A distributed computing system consists of spatially separated processes that do not share a common memory and communicate asynchronously with each other by message passing over communication channels. Each component of a distributed system has a local state. The state of a process is characterized by the state of its local memory and a history of its activity. The state of a channel is characterized by the set of messages sent along the channel less the messages received along the channel. The global state of a distributed system is a collection of the local states of its components.
Recording the global state of a distributed system is an important paradigm and it finds applications in several aspects of distributed system design.
Deadlocks are a fundamental problem in distributed systems and deadlock detection in distributed systems has received considerable attention in the past. In distributed systems, a process may request resources in any order, which may not be known a priori, and a process can request a resource while holding others. If the allocation sequence of process resources is not controlled in such environments, deadlocks can occur. A deadlock can be defined as a condition where a set of processes request resources that are held by other processes in the set.
Deadlocks can be dealt with using any one of the following three strategies: deadlock prevention, deadlock avoidance, and deadlock detection. Deadlock prevention is commonly achieved by either having a process acquire all the needed resources simultaneously before it begins execution or by pre-empting a process that holds the needed resource. In the deadlock avoidance approach to distributed systems, a resource is granted to a process if the resulting global system is safe. Deadlock detection requires an examination of the status of the process–resources interaction for the presence of a deadlock condition. To resolve the deadlock, we have to abort a deadlocked process.
In this chapter, we study several distributed deadlock detection techniques based on various strategies.
System model
A distributed system consists of a set of processors that are connected by a communication network. The communication delay is finite but unpredictable.
This chapter deals with the design of fault-tolerant distributed systems. It is widely known that the design and verification of fault-tolerent distributed systems is a difficult problem. Consensus and atomic broadcast are two important paradigms in the design of fault-tolerent distributed systems and they find wide applications. Consensus allows a set of processes to reach a common decision or value that depends upon the initial values at the processes, regardless of failures. In atomic broadcast, processes reliably broadcast messages such that they agree on the set of messages delivered and the order of message deliveries.
This chapter focuses on solutions to consensus and atomic broadcast problems in asynchronous distributed systems. In asynchronous distributed systems, there is no bound on the time it takes for a process to execute a computation step or for a message to go from its sender to its receiver. In an asynchronous distributed system, there is no upper bound on the relative processor speeds, execution times, clock drifts, and delay during the transmission of messages although they are finite. This is mainly casued by unpredictable loads on the system that causes asynchrony in the system and one cannot make any timing assumptions of any types. On the other hand, synchronous systems are characterized by strict bounds on the execution times and message transmission delays.
The concept of causality between events is fundamental to the design and analysis of parallel and distributed computing and operating systems. Usually causality is tracked using physical time. However, in distributed systems, it is not possible to have global physical time; it is possible to realize only an approximation of it. As asynchronous distributed computations make progress in spurts, it turns out that the logical time, which advances in jumps, is sufficient to capture the fundamental monotonicity property associated with causality in distributed systems. This chapter discusses three ways to implement logical time (e.g., scalar time, vector time, and matrix time) that have been proposed to capture causality between events of a distributed computation.
Causality (or the causal precedence relation) among events in a distributed system is a powerful concept in reasoning, analyzing, and drawing inferences about a computation. The knowledge of the causal precedence relation among the events of processes helps solve a variety of problems in distributed systems. Examples of some of these problems is as follows:
Distributed algorithms design The knowledge of the causal precedence relation among events helps ensure liveness and fairness in mutual exclusion algorithms, helps maintain consistency in replicated databases, and helps design correct deadlock detection algorithms to avoid phantom and undetected deadlocks.
Peer-to-peer (P2P) network systems use an application-level organization of the network overlay for flexibly sharing resources (e.g., files and multimedia documents) stored across network-wide computers. In contrast to the client–server model, any node in a P2P network can act as a server to others and, at the same time, act as a client. Communication and exchange of information is performed directly between the participating peers and the relationships between the nodes in the network are equal. Thus, P2P networks differ from other Internet applications in that they tend to share data from a large number of end users rather than from the more central machines and Web servers. Several well known P2P networks that allow P2P file-sharing include Napster, Gnutella, Freenet, Pastry, Chord, and CAN.
Traditional distributed systems used DNS (domain name service) to provide a lookup from host names (logical names) to IP addresses. Special DNS servers are required, and manual configuration of the routing information is necessary to allow requesting client nodes to navigate the DNS hierarchy. Further, DNS is confined to locating hosts or services (not data objects that have to be a priori associated with specific computers), and host names need to be structured as per administrative boundary regulations. P2P networks overcome these drawbacks, and, more importantly, allow the location of arbitrary data objects.
In this chapter, we first study a methodical framework in which distributed algorithms can be classified and analyzed. We then consider some basic distributed graph algorithms. We then study synchronizers, which provide the abstraction of a synchronous system over an asynchronous system. Finally, we look at some practical graph problems, to appreciate the necessity of designing efficient distributed algorithms.
Topology abstraction and overlays
The topology of a distributed system can be typically viewed as an undirected graph in which the nodes represent the processors and the edges represent the links connecting the processors. Weights on the edges can represent some cost function we need to model in the application. There are usually three (not necessarily distinct) levels of topology abstraction that are useful in analyzing the distributed system or a distributed application. These are now described using Figure 5.1. To keep the figure simple, only the relevant end hosts participating in the application are shown. The WANs are indicated by ovals drawn using dashed lines. The switching elements inside the WANs, and other end hosts that are not participating in the application, are not shown even though they belong to the physical topological view. Similarly, all the edges connecting all end hosts and all edges connecting to all the switching elements inside the WANs also belong to the physical topology view even though only some edges are shown.