To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Chapter 4 discussed the conversion of the DAG structure of a BN into a junction tree. In a BN, the strength of probabilistic dependence between variables is encoded by conditional probability distributions. This quantitative knowledge is encoded in a junction tree model in terms of probability distributions over clusters. For flexibility, these distributions are often unnormalized and are termed potentials. This chapter addresses conversion of the conditional probability distributions of a BN into potentials in a junction tree model and how to perform belief updating by passing potentials as concise messages in a junction tree.
Section 5.2 defines basic operations over potentials: product, quotient, and marginal. Important properties of mixed operations are discussed, including associativity, order independence, and reversibility. These basic and mixed operations form the basis of message manipulation during concise message passing. Initializing of potentials in a junction tree according to the Bayesian network from which it is derived is then considered in Section 5.3. Section 5.4 presents an algorithm for message passing over a separator in a junction tree and discusses the algorithm's consequences. Extending this algorithm, Section 5.5 addresses belief updating by message passing in a junction tree model and formally establishes the correctness of the resultant belief. Processing observations is described in Section 5.6.
Guide to Chapter 5
Given a BN, its DAG structure provides the qualitative knowledge about the dependence among domain variables.
Chapters 2 through 5 studied exact probabilistic reasoning using a junction tree representation converted from a Bayesian network. The single-agent paradigm is followed in the study. Under this paradigm, a single computational entity, an agent, has access to a BN over a problem domain, converts the BN into a JT, acquires observations from the domain, reasons about the state of the domain by concise message passing over the JT, and takes actions accordingly. Such a paradigm has its limitations: A problem domain may be too complex, and thus building a single agent capable of being in charge of the reasoning task for the entire domain becomes too difficult. Examples of complex domains include designing intricate machines such as an aircraft and monitoring and troubleshooting complicated mechanisms such as chemical processes. The problem domain may spread over a large geographical area, and thus transmitting observations from many regions to a central location for processing is undesirable owing to communications cost, delay, and unreliability.
This and subsequent chapters consider the uncertain reasoning task under the multiagent paradigm in which a set of cooperating computational agents takes charge of the reasoning task of a large and complex uncertain problem domain. This chapter deals with the knowledge representation. A set of five basic assumptions is introduced to describe some ideal knowledge representation formalisms for multiagent uncertain reasoning. These assumptions are shown to give rise to a particular knowledge representation formalism termed multiply sectioned Bayesian networks (MSBNs).
This book investigates opportunities for building intelligent decision support systems offered by multiagent, distributed probabilistic reasoning. Probabilistic reasoning with graphical models, known as Bayesian networks or belief networks, has become an active field of research and practice in artificial intelligence, operations research, and statistics in the last two decades. Inspired by the success of Bayesian networks and other graphical dependence models under the centralized and single-agent paradigm, this book extends them to representation formalisms under the distributed and multiagent paradigm. The major technical challenges to such an endeavor are identified and the results from a decade's research are presented. The framework developed allows distributed representation of uncertain knowledge on a large and complex environment embedded in multiple cooperative agents and effective, exact, and distributed probabilistic inference.
Under the single-agent paradigm, many exact or approximate methods have been proposed for probabilistic reasoning using graphical models. Not all of them admit effective extension into the multiagent paradigm. Concise message passing in a compiled, treelike graphical structure has emerged from a decade's research as one class of methods that extends well into the multiagent paradigm. How to structure multiple agents' diverse knowledge on a complex environment as a set of coherent probabilistic graphical models, how to compile these models into graphical structures that support concise message passing, and how to perform concise message passing to accomplish tasks in model verification, model compilation, and distributed inference are the foci of the book. The advantages of concise message passing over alternative methods are also analyzed.
To act in a complex problem domain, a decision maker needs to know the current state of the domain in order to choose the most appropriate action. In a domain about which the decision maker has only uncertain knowledge and partial observations, it is often impossible to estimate the state of the domain with certainty. We introduce Bayesian networks as a concise graphical representation of a decision maker's probabilistic knowledge of an uncertain domain. We raise the issue of how to use such knowledge to estimate the current state of the domain effectively. To accomplish this task, the idea of message passing in graphical models is illustrated with several alternative methods. Subsequent chapters will present representational and computational techniques to address the limitation of these methods.
The basics of Bayesian probability theory are reviewed in Section 2.2. This is followed in Section 2.3 by a demonstration of the intractability of traditional belief updating using joint probability distributions. The necessary background in graph theory is then provided in Section 2.4. Section 2.5 introduces Bayesian networks as a concise graphical model for probabilistic knowledge. In Section 2.6, the fundamental idea of local computation and message passing in modern probabilistic inference using graphical models is illustrated using so-called λ – π message passing in tree-structured models. The limitation of λ – π message passing is discussed followed by the presentation of an alternative exact inference method, loop cutset conditioning, in Section 2.7 and an alternative approximate inference method, forward stochastic sampling, in Section 2.8.
In Chapters 6 through 9, we studied in detail why a set of agents over a large and complex domain should be organized into an MSBN and how. We studied how they can perform probabilistic reasoning exactly, effectively, and distributively. In this chapter, we discuss other important issues that have not yet been addressed but will merit research effort in the near future.
Multiagent Reasoning in Dynamic Domains
Practical problem domains can be static or dynamic. In a static domain, each domain variable takes a value from its space and will not change its value with time. Hence, at what instant in time an agent observes the variable makes no difference. On the other hand, in a dynamic domain, a variable may take different values from its space at different times. The temperature of a house changes after heating is turned on. The pressure of a sealed boiler at a chemical plant increases after the liquid inside boils. A patient suffers from a disease and recovers after the proper treatment. A device in a piece of equipment behaves normally until it wears out. Dynamic domains are more general, and a static domain can be viewed as a snapshot of a dynamic domain at a particular instant in time or within a time period in which the changes of variable values are ignorable.
A Bayesian network can be used to model static and dynamic domains.
Chapter 3 has shown that, in order to use concise message passing in a single cluster graph for exact belief updating with a nontree BN, one must reorganize the DAG into a junction tree. Graphical representations of probabilistic knowledge result in efficiency through the exploration of conditional independence in terms of graphical separation, as seen in Chapter 2. Therefore, the reorganization needs to preserve the independence–separation relations of the BN as much as possible. This chapter formally describes how independence is mapped into separation in different graphical structures and presents algorithms for converting a DAG dependence structure into a junction tree while preserving graphical separation to the extent possible.
Section 4.2 defines the graphical separation in three types of graphs commonly used for modeling probabilistic knowledge: u-separation in undirected graphs, d-separation in directed acyclic graphs, and h-separation in junction trees. The relation between conditional independence and the sufficient content of a message in concise message passing is established in Section 4.3. In Section 4.4, the concept of the independence map or I-map, which ties a graphical model to a problem domain based on the extent to which the model captures the conditional independence of the domain, is introduced. The concept of a moral graph is also introduced as an intermediate undirected graphical model to facilitate the conversion of a DAG model to a junction tree model. Section 4.5 introduces a class of undirected graphs known as chordal graphs and establishes the relation between chordal graphs and junction trees.
Chapter 7 has presented compilation of an MSBN into an LJF as an alternative dependence structure suitable for multiagent belief updating by concise message passing. Just as in the single-agent paradigm in which the conditional probability distributions of a BN are converted into potentials in a junction tree model, the conditional probability distributions in an MSBN need to be converted into potentials in the LJF before inference can take place. This chapter presents methods for performing such conversions and passing potentials as messages effectively among agents so that each agent can update belief correctly with respect to the observations made by all agents in the system.
Section 8.2 defines the potential associated with each component of an LJF and describes their initialization based on probability distributions in the original MSBN. Section 8.3 analyzes the topological structures of two linkage trees over an agent interface computed by two adjacent agents through distributed computation. This analysis demonstrates that, even though each linkage tree is created by one of the agents independently, the two linkage trees have equivalent topologies. This result ensures that the two agents will have the identical message structures when they communicate through the corresponding linkage trees. Sections 8.4 and 8.5 present direct interagent message passing between a pair of agents. The effects of such message passing are formally established. The algorithms for multiagent communication through intra- and interagent message passing are presented in Section 8.6.
An intelligent agent is a computational or natural system that senses its environment and takes actions intelligently according to its goals. We focus on computational (versus natural) agents that act in the interests of their human principals. Such intelligent agents aid humans in making decisions. Intelligent agents can play several possible roles in the human decision process. They may play the roles of a consultant, an assistant, or a delegate. For simplicity, we will refer to intelligent agents as just agents.
When an agent acts as a consultant (Figure 1.1), it senses the environment but does not take actions directly. Instead, it tells the human principal what it thinks should be done. The final decision rests on the human principal. Many expert systems, such as medical expert systems (Teach and Shortliffe [75]), are used in this way. In one possible scenario, human doctors independently examine patients and arrive at their own opinions about the diseases in question. However, before the physicians finalize their diagnoses and treatments, the recommendations from expert systems are considered, possibly causing the doctors to revise their original opinions. Intelligent agents are used as consultants when the decision process can be conducted properly by humans with satisfactory results, the consequences of a bad decision are serious, and agent performance is comparable to that of humans but the agents have not been accorded high degrees of trust.
In Chapter 6, MSBNs were derived as the knowledge representation for multiagent uncertain reasoning under the five basic assumptions. As in the case of single-agent BNs, we want agents organized into an MSBN to perform exact inference effectively by concise message passing. Chapter 4 discussed converting or compiling a multiply connected BN into a junction tree model to perform belief updating by message passing. Because each subnet in an MSBN is multiply connected in general, a similar compilation is needed to perform belief updating in an MSBN by message passing. In this chapter, we present the issues and algorithms for the structural compilation of an MSBN. The outcome of the compilation is an alternative dependence structure called a linked junction forest. Most steps involved in compiling an MSBN are somewhat parallel to those used in compiling a BN such as moralization, triangulation, and junction tree construction, although additional issues must be dealt with.
The motivations for distributed compilation are discussed in Section 7.2. Section 7.3 presents algorithms for multiagent distributive compilation of the MSBN structure into its moral graph structure. Sections 7.4 and 7.5 introduce an alternative representation of the agent interface called a linkage tree, which is used to support concise interagent message passing. The need to construct linkage trees imposes additional constraints when the moral graph structure is triangulated into the chordal graph structure. Section 7.6 develops algorithms for multiagent distributive triangulation subject to these constraints.
In the preceding chapters we investigated in detail the scenario of a student perceptron learning from a teacher perceptron. This is a typical example of what is commonly referred to as supervised learning. But we all gratefully acknowledge that learning from examples does not always require the presence of a teacher!
However, what is it that can be learned besides some specific classification of examples provided by a teacher? The key observation is that learning from unclassified examples is possible if their distribution has some underlying structure. The main issue in unsupervised learning is then to extract these intrinsic features from a set of examples alone. This problem is central to many pattern recognition and data compression tasks with a variety of important applications [110].
Far from attempting to review the many existing approaches to unsupervised learning, we will show in the present chapter how statistical mechanics methods introduced before can be applied to some special scenarios of unsupervised learning closely related to the teacher–student perceptron problem. This will illustrate on the one hand how statistical mechanics can be used for the analysis of unsupervised situations, while on the other hand we will gain new understanding of the supervised problem by reformulating it as a special case of an unsupervised one.
For a fixed set of input examples, one can decompose the N-sphere into cells each consisting of all the perceptron coupling vectors J giving rise to the same classification of those examples. Several aspects of perceptron learning discussed in the preceding chapters are related to the geometric properties of this decomposition, which turns out to have random multifractal properties. Our outline of the mathematical techniques related to the multifractal method will of course be short and ad rem; see [172, 173] for a more detailed introduction. But this alternative description provides a deeper and unified view of the different learning properties of the perceptron. It highlights some of the more subtle aspects of the thermodynamic limit and its role in the statistical mechanics analysis of perceptron learning. In this way we finish our discussion of the perceptron with an encompassing multifractal description, preparing the way for the application of this approach to the analysis of multilayer networks.
The shattered coupling space
Consider a set of p = αN examples ξµ generated independently at random from the uniform distribution on the N-sphere. Each hyperplane perpendicular to one of these inputs cuts the coupling space of a spherical perceptron, which is the very same N-sphere, into two half-spheres according to the two possible classifications of the example.
In this book we have discussed how various aspects of learning in artificial neural networks may be quantified by using concepts and techniques developed in the statistical mechanics of disordered systems. These methods grew out of the desire to understand some strange low-temperature properties of disordered magnets; nevertheless their usefulness for and efficiency in the analysis of a completely different class of complex systems underlines the generality and strength of the principles of statistical mechanics.
In this final chapter we have collected some additional examples of non-physical complex systems for which an analysis using methods of statistical mechanics similar to those employed for the study of neural networks has given rise to new and interesting results. Compared with the previous chapters, the discussions in the present one will be somewhat more superficial – merely pointing to the qualitative analogies with the problems elucidated previously, rather than working out the consequences in full detail. Moreover, some of the problems we consider are strongly linked to information processing and artificial neural networks, whereas others are not. In all cases quenched random variables are used to represent complicated interactions which are not known in detail, and the typical behaviour in a properly defined thermodynamic limit is of particular interest.
Support vector machines
The main reason which prevents the perceptron from being a serious candidate for the solution of many real-world learning problems is that it can only implement linearly separable Boolean functions.
The Gibbs rule discussed in the previous chapter characterizes the typical generalization behaviour of the students forming the version space. It is hence well suited for a general theoretical analysis. For a concrete practical problem it is, however, hardly the best choice and there is a variety of other learning rules which are often more direct and may also show a better performance. The purpose of this chapter is to introduce a representative selection of these learning rules, to discuss some of their features, and to compare their properties with those of the Gibbs rule.
The Hebb rule
The oldest and maybe most important learning rule was introduced by D. Hebb in the late 1940s. It is, in fact, an application at the level of single neurons of the idea of Pavlov coincidence training. In his famous experiment, Pavlov showed how a dog, which was trained to receive its food when, at the same time, a light was being turned on, would also start to salivate when the light alone was lit. In some way, the coincidence of the two events, food and light, had established a connection in the brain of the dog such that, even when only one of the events occurred, the memory of the other would be stimulated. The basic idea behind the Hebb rule [32] is quite similar: strengthen the connection of neurons that fire together.