To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
We introduce Bayesian networks in this chapter as a modeling tool for compactly specifying joint probability distributions.
Introduction
We have seen in Chapter 3 that joint probability distributions can be used to model uncertain beliefs and change them in the face of hard and soft evidence. We have also seen that the size of a joint probability distribution is exponential in the number of variables of interest, which introduces both modeling and computational difficulties. Even if these difficulties are addressed, one still needs to ensure that the synthesized distribution matches the beliefs held about a given situation. For example, if we are building a distribution that captures the beliefs of a medical expert, we may need to ensure some correspondence between the independencies held by the distribution and those believed by the expert. This may not be easy to enforce if the distribution is constructed by listing all possible worlds and assessing the belief in each world directly.
The Bayesian network is a graphical modeling tool for specifying probability distributions that, in principle, can address all of these difficulties. The Bayesian network relies on the basic insight that independence forms a significant aspect of beliefs and that it can be elicited relatively easily using the language of graphs. We start our discussion in Section 4.2 by exploring this key insight, and use our developments in Section 4.3 to provide a formal definition of the syntax and semantics of Bayesian networks.
We present in this chapter a variation on the variable elimination algorithm, known as the jointree algorithm, which can be understood in terms of factor elimination. This algorithm improves on the complexity of variable elimination when answering multiple queries. It also forms the basis for a class of approximate inference algorithms that we discuss in Chapter 14.
Introduction
Consider a Bayesian network and suppose that our goal is to compute the posterior marginal for each of its n variables. Given an elimination order of width w, we can compute a single marginal using variable elimination in O(n exp(w)) time and space, as we explained in Chapter 6. To compute all these marginals, we can then run variable elimination O(n) times, leading to a total complexity of O(n2 exp(w)).
For large networks, the n2 factor can be problematic even when the treewidth is small. The good news is that we can avoid this complexity and compute marginals for all networks variables in only O(n exp(w)) time and space. This can be done using a more refined algorithm known as the jointree algorithm, which is the main subject of this chapter. The jointree algorithm will also compute the posterior marginals for other sets of variables, including all network families, where a family consists of a variable and its parents in the Bayesian network. Family marginals are especially important for sensitivity analysis, as discussed in Chapter 16, and for learning Bayesian networks, as discussed in Chapters 17 and 18.
Essential questions regarding the structure, underlying principles, and semantics of protein-protein interaction (PPI) networks can be addressed by an examination of their topological features and components. Network performance, scalability, robustness, and dynamics are often dependent on these topological properties. Much research has been devoted to the development of methods to quantitatively characterize a network or its components. Empirical and theoretical studies of networks of all types - technological, social, and biological - have been among the most popular subjects of recent research in many fields. Graph theories have been successfully applied to these real-world systems, and many graph and component measurements have been introduced.
In Chapters 4 and 5, we provided an introduction to the typical topological properties of real complex networks, including degree distribution, attachment tendency, and reachability indices. We also introduced the scale-free model, which is among the most popular network models. This model exemplifies several important topological properties, which will be briefly summarized here:
■ The small-world property: Despite the large size of most real-world networks, arelatively short path can be found between any two constituent nodes. The smallworld property states that any node in a real-world network can be reached from any other node within a small number of steps. As Erdős and Rényi [100, 101] have demonstrated, the typical distance between any two nodes in a random network is the logarithm of the number of nodes, indicating that random graphs are also characterized by this property.
The classic approaches to clustering follow a protocol termed “pattern proximity after feature selection” [158]. Pattern proximity is usually measured by a distance function defined for pairs of patterns. A simple distance measurement can capture the dissimilarity between two patterns, while similarity measures can be used to characterize the conceptual similarity between patterns. In protein-protein interaction (PPI) networks, proteins are represented as nodes and interactions are represented as edges. The relationship between two proteins is therefore a simple binary value: 1 if they interact, 0 if they do not. This lack of nuance makes it difficult to define the distance between the two proteins. The reliable clustering of PPI networks is further complicated by a high rate of false positives and the sheer volume of data, as discussed in Chapter 2.
Distance-based clustering employs these classic techniques and focuses on the definition of the topological or biological distance between proteins. These clustering approaches begin by defining the distance or similarity between two proteins in the network. This distance/similarity matrix can then be incorporated into traditional clustering algorithms. In this chapter, we will discuss a variety of approaches to distance-based clustering, all of which are grounded upon the use of these classic techniques.
TOPOLOGICAL DISTANCE MEASUREMENT BASED ON COEFFICIENTS
The simplest of these approaches use classic distance measurement methods and their various coefficient formulas to compute the distance between proteins in PPI networks. As discussed in [123], the distance between two nodes (proteins) in a PPI network can be defined as follows.
We discuss in this chapter the process of learning Bayesian networks from data. The learning process is studied under different conditions, which relate to the nature of available data and the amount of prior knowledge we have on the Bayesian network.
Introduction
Consider Figure 17.1, which depicts a Bayesian network structure from the domain of medical diagnosis (we treated this network in Chapter 5). Consider also the data set depicted in this figure. Each row in this data set is called a case and represents a medical record for a particular patient. Note that some of the cases are incomplete, where “?” indicates the unavailability of corresponding data for that patient. The data set is therefore said to be incomplete due to these missing values; otherwise, it is called a complete data set.
A key objective of this chapter is to provide techniques for estimating the parameters of a network structure given both complete and incomplete data sets. The techniques we provide therefore complement those given in Chapter 5 for constructing Bayesian networks. In particular we can now construct the network structure from either design information or by working with domain experts, as discussed in Chapter 5, and then use the techniques discussed in this chapter to estimate the CPTs of these structures from data. We also discuss techniques for learning the network structure itself, although our focus here is on complete data sets for reasons that we state later.
We discuss in this chapter computational techniques for exploiting certain properties of network parameters, allowing one to perform inference efficiently in some situations where the network treewidth can be quite large.
Introduction
We discussed in Chapters 6–8 two paradigms for probabilistic inference based on elimination and conditioning, showing how they lead to algorithms whose time and space complexity are exponential in the network treewidth. These algorithms are often called structure-based since their performance is driven by the network structure and is independent of the specific values attained by network parameters. We also presented in Chapter 11 some CNF encodings of Bayesian networks, allowing us to reduce probabilistic inference to some well-known CNF tasks. The resulting CNFs were also independent of the specific values of network parameters and are therefore also structure-based.
However, the performance of inference algorithms can be enhanced considerably if one exploits the specific values of network parameters. The properties of network parameters that lend themselves to such exploitation are known as parametric or local structure. This type of structure typically manifests in networks involving logical constraints, contextspecific independence, or local models of interaction, such as the noisy-or model discussed in Chapter 5.
In this chapter, we present a number of computational techniques for exploiting local structure that can be viewed as extensions of inference algorithms discussed in earlier chapters. We start in Section 13.2 with an overview of local structure and the impact it can have on the complexity of inference.
We introduce propositional logic in this chapter as a tool for representing and reasoning about events.
Introduction
The notion of an event is central to both logical and probabilistic reasoning. In the former, we are interested in reasoning about the truth of events (facts), while in the latter we are interested in reasoning about their probabilities (degrees of belief). In either case, one needs a language for expressing events before one can write statements that declare their truth or specify their probabilities. Propositional logic, which is also known as Boolean logic or Boolean algebra, provides such a language.
We start in Section 2.2 by discussing the syntax of propositional sentences, which we use for expressing events. We then follow in Section 2.3 by discussing the semantics of propositional logic, where we define properties of propositional sentences, such as consistency and validity, and relationships among them, such as implication, equivalence, and mutual exclusiveness. The semantics of propositional logic are used in Section 2.4 to formally expose its limitations in supporting plausible reasoning. This also provides a good starting point for Chapter 3, where we show how degrees of belief can deal with these limitations.
In Section 2.5, we discuss variables whose values go beyond the traditional true and false values of propositional logic. This is critical for our treatment of probabilistic reasoning in Chapter 3, which relies on the use of multivalued variables.
In recent years, the genomic sequencing of several model organisms has been completed. As of June 2006, complete genome sequences were available for 27 archaeal, 326 bacterial, and 21 eukaryotic organisms, and the sequencing of 316 bacterial, 24 archaeal, and 126 eukaryotic genomes was in progress [281]. In addition, the development of a variety of high-throughput methods, including the two-hybrid system, DNA microarrays, genomic SNP arrays, and protein chips, has generated large amounts of data suitable for the analysis of protein function. Although it is possible to determine the interactions between proteins and their functions accurately using biochemical/molecular experiments, such efforts are often very slow, costly and require extensive experimental validation. Therefore, the analysis of protein function in available databases offers an attractive prospect for less resource-intensive investigation.
Work with these sequenced genomes is hampered, however, by the fact that only 50–60% of their component genes have been annotated [281]. Several approaches have been developed to predict the functions of these unannotated proteins. The accurate prediction of protein function is of particular importance to an understanding of the critical cellular and biochemical processes in which they play a vital role. Methods that allow researchers to infer the functions of unannotated proteins using known functional annotations of proteins and the interaction patterns between them are needed.
Machine learning has been widely applied in the field of protein-protein interaction (PPI) networks and is particularly well suited to the prediction of protein functions. Methods have been developed to predict protein functions using a variety of information sources, including protein structure and sequence, protein domain, PPIs, genetic interactions, and the analysis of gene expression.
We consider in this chapter three models of graph decomposition: elimination orders, jointrees and dtrees, which underly the key inference algorithms we discussed thus far. We present formal definitions of these models, provide polytime, width-preserving transformations between them, and show how the optimal construction of each of these models corresponds in a precise sense to the process of optimally triangulating a graph.
Introduction
We presented three inference algorithms in previous chapters whose complexity can be exponential only in the network treewidth: variable elimination, factor elimination (jointree), and recursive conditioning. Each one of these algorithms can be viewed as decomposing the Bayesian network in a systematic manner, allowing us to reduce a query with respect to some network into a query with respect to a smaller network. In particular, variable elimination removes variables one at a time from the network, while factor elimination removes factors one at a time and recursive conditioning partitions the network into smaller pieces. We also saw how the decompositional choices made by these algorithms can be formalized using elimination orders, elimination trees (jointrees), and dtrees, respectively. In fact, the time and space complexity of each of these algorithms was characterized using the width of its corresponding decomposition model, which is lower-bounded by the treewidth.
We provide a more comprehensive treatment of decomposition models in this chapter including polytime, width-preserving transformations between them. These transformations allow us to convert any method for constructing low-width models of one type into low-width models of other types.
We present in this chapter one of the simplest methods for general inference in Bayesian networks, which is based on the principle of variable elimination: A process by which we successively remove variables from a Bayesian network while maintaining its ability to answer queries of interest.
Introduction
We saw in Chapter 5 how a number of real-world problems can be solved by posing queries with respect to Bayesian networks. We also identified four types of queries: probability of evidence, prior and posterior marginals, most probable explanation (MPE), and maximum a posterior hypothesis (MAP). We present in this chapter one of the simplest inference algorithms for answering these types of queries, which is based on the principle of variable elimination. Our interest here will be restricted to computing the probability of evidence and marginal distributions, leaving the discussion of MPE and MAP queries to Chapter 10.
We start in Section 6.2 by introducing the process of eliminating a variable. This process relies on some basic operations on a class of functions known as factors, which we discuss in Section 6.3. We then introduce the variable elimination algorithm in Section 6.4 and see how it can be used to compute prior marginals in Section 6.5. The performance of variable elimination will critically depend on the order in which we eliminate variables. We discuss this issue in Section 6.6, where we also provide some heuristics for choosing good elimination orders.
The component proteins within protein-protein interaction (PPI) networks are associated in two types of groupings: protein complexes and functional modules. Protein complexes are assemblages of proteins that interact with each other at a given time and place, forming a single multimolecular machine. Functional modules consist of proteins that participate in a particular cellular process while binding to each other at various times and places. The detection of these groupings, known as modularity analysis, is an area of active research. In particular, the graphic representation of PPI networks has facilitated the discrimination of protein clusters through data-mining techniques.
The methods of data mining can be applied to identify various aspects of network organization. For example:
Proteins located at neighboring positions in a graph are generally considered to share functions (“guilt by association”). On this basis, the functions of a protein may be predicted by examining the proteins with which it interacts and the protein complexes to which it belongs.
Densely connected subgraphs in the network are likely to form protein complexes that function as single units in a particular biological process.
Investigation of network topological features can shed light on the biological system [29]. For example, networks may be scale-free, governed by the power law, or of various sizes.
A cluster is a set of objects that share some common characteristics. Clustering is the process of grouping data objects into sets (clusters); objects within a cluster demonstrate greater similarity than do objects in different clusters. In a PPI network, these sets will be either protein complexes or functional modules.
Modules (or clusters) in protein-protein interaction (PPI) networks can be identified by applying various clustering algorithms that use graph theory. Each of these methods converts the process of clustering a PPI dataset into a graph-theoretic analysis of the corresponding PPI network. Such clustering approaches take into consideration either the local topology or the global structure of the networks.
The graph-theoretic approaches to modularity analysis can be divided into two classes. One type of approaches [24, 238, 272, 286] seeks to identify dense subgraphs by maximizing the density of each subgraph on the basis of local network topology. The goal of the second group of methods [94,99,138,180,250] is to find the best partition in a graph. Based on the global structure of a network, the methods in this class minimize the cost of partitioning or separating the graph. The approaches in these classes will be discussed in the first two sections of this chapter.
PPI networks are typically large, often having more than 6,000 nodes. In a graph of such large size, classical graph-theoretic algorithms become inefficient. A graph reduction-based approach [65], which enhances the efficiency of module detection in such large and complex interaction networks, will be explored in the third section of this chapter.
FINDING DENSE SUBGRAPHS
In this section, we will discuss those graph-theoretic approaches that seek to identify the densest subgraphs within a graph; specific methods vary in the means used to assess the density of the subgraphs. Six variations on this theme will be discussed in the following subsections.
We discuss in this chapter a class of approximate inference algorithms which are based on belief propagation. These algorithms provide a full spectrum of approximations, allowing one to trade-off approximation quality with computational resources.
Introduction
The algorithm of belief propagation was first introduced as a specialized algorithm that applied only to networks having a polytree structure. This algorithm, which we treated in Section 7.5.4, was later applied to networks with arbitrary structure and found to produce high-quality approximations in certain cases. This observation triggered a line of investigations into the semantics of belief propagation, which had the effect of introducing a generalization of the algorithm that provides a full spectrum of approximations with belief propagation approximations at one end and exact results at the other.
We discuss belief propagation as applied to polytrees in Section 14.2 and then discuss its application to more general networks in Section 14.3. The semantics of belief propagation are exposed in Section 14.4, showing howit can be viewed as searching for an approximate distribution that satisfies some interesting properties. These semantics will then be the basis for developing generalized belief propagation in Sections 14.5–14.7. An alternative semantics for belief propagation will also be given in Section 14.8, together with a corresponding generalization. The difference between the two generalizations of belief propagation is not only in their semantics but also in the way they allow the user to trade off the approximation quality with the computational resources needed to produce them.
We consider in this chapter the computational complexity of probabilistic inference. We also provide some reductions of probabilistic inference to well known problems, allowing us to benefit from specialized algorithms that have been developed for these problems.
Introduction
In previous chapters, we discussed algorithms for answering three types of queries with respect to a Bayesian network that induces a distribution Pr(X). In particular, given some evidence e we discussed algorithms for computing:
The probability of evidence e, Pr(e) (see Chapters 6–8)
The MPE probability for evidence e, MPEP (e) (see Chapter 10)
The MAP probability for variables Q and evidence e, MAPP (Q, e) (see Chapter 10).
In this chapter, we consider the complexity of three decision problems that correspond to these queries. In particular, given a number p, we consider the following problems:
D-PR: Is Pr(e) > p?
D-MPE: Is there a network instantiation x such that Pr(x, e) > p?
D-MAP: Given variables Q ⊆ X, is there an instantiation q such that Pr(q, e) > p?
We also consider a fourth decision problem that includes D-PR as a special case:
D-MAR: Given variables Q ⊆ X and instantiation q, is Pr(q|e) > p?
Note here that when e is the trivial instantiation, D-MAR reduces to asking whether Pr(q) > p, which is identical to D-PR.
We provide a number of results on these decision problems in this chapter. In particular, we show in Sections 11.2–11.4 that D-MPE is NP-complete, D-PR and D-MAR are PPcomplete, and D-MAP is NPPP-complete.
We discuss in this chapter a class of approximate inference algorithms based on stochastic sampling: a process by which we repeatedly simulate situations according to their probability and then estimate the probabilities of events based on the frequency of their occurrence in the simulated situations.
Introduction
Consider the Bayesian network in Figure 15.1 and suppose that our goal is to estimate the probability of some event, say, wet grass. Stochastic sampling is a method for estimating such probabilities that works by measuring the frequency at which events materialize in a sequence of situations simulated according to their probability of occurrence. For example, if we simulate 100 situations and find out that the grass is wet in 30 of them, we estimate the probability of wet grass to be 3/10. As we see later, we can efficiently simulate situations according to their probability of occurrence by operating on the corresponding Bayesian network, a process that provides the basis for many of the sampling algorithms we consider in this chapter.
The statements of sampling algorithms are remarkably simple compared to the methods for exact inference discussed in previous chapters, and their accuracy can be made arbitrarily high by increasing the number of sampled situations. However, the design of appropriate sampling methods may not be trivial as we may need to focus the sampling process on a set of situations that are of particular interest.