To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
An adaptive fuzzy sliding control scheme is proposed to control a passive robotic manipulator. The motivation for the design of the adaptive fuzzy sliding controller is to eliminate the chattering and the requirement of pre-knowledge on bounds of error associated with the conventional sliding control. The stability and convergence of the adaptive fuzzy sliding controller is proven both theoretically and practically by simulations. A three-link passive manipulator model with two unactuated joints is derived to be used in the simulations. Simulation results demonstrate that the proposed system is robust against structured and unstructured uncertainties.
An auto-bonding robot (ABR) that consists of the mechanism of adhesive dispensing and auto-bonding, a pneumatic system and a control system, is presented in this paper. It is designed for the bonding operation of cover-glasses and space solar cells using adhesives. An adhesive dispensing method is proposed to control the thickness and position of the adhesive layer on solar cells and to provide a satisfactory bonding accuracy. The bubble-free bonding process is realized by the leaning mechanism of a pneumatic sucker. Experimental comparison of the manual and automatic bonding methods showed that there are no fragment and air bubbles between the cover-glass and the space solar cell, and no outflow adhesive on the surface by the automatic bonding process in a non-vacuum condition. The novel automatic bonding robot greatly improved the lightweight space solar cells bonding quality and production rate.
For many biological creatures sensory whiskers are an effective means of detecting and recognising nearby objects. The project described in this paper has the aim of demonstrating that whisker sensors can be used as a similarly effective form of robot sensing. Many mobile robots have used whiskers as simple switches to warn of an imminent collision. However, these devices cannot provide the detailed surface profile information required to recognise and accurately locate objects. Several research groups have built advanced whisker sensors that can determine the position of a contact along the length of the whisker. Although these whisker sensors are usually little more than a length of flexible spring material they do require complex sensing and actuation mechanisms at the whisker root. In this project an array of eight whisker sensors is scanned over external objects by the motion of the robot. The resulting deflection of the whiskers is monitored by a potentiometer at each whisker root. By recording the deflection of the whiskers as they slide over external objects sequences of surface points can be determined. Object recognition algorithms have been developed that allow the robot to recognise, grasp and retrieve a range of objects using the whisker data. In this paper the robot, WhiskerBOT, is described together with the object recognition and localisation algorithms. Results of practical experiments are also presented.
In this paper, the problem of precision reaching applications in robotic systems for scenarios with static and non-static objects has been considered and a solution based on a modular neural architecture has been proposed and implemented. The goal of this solution is to combine robustness and capability mapping trajectories from two biologically plausible neural network sub-modules: Hyper RBF and AVITE. The Hyper Basis Radial Function (HypRBF) neural network solves the inverse kinematic in redundant robotic systems, while the Adaptive Vector Integration to End-Point (AVITE) visuo-motor neural model quickly maps the difference vector between current and desired position in both spatial (visual information) and motor coordinates (propioceptive information). The anthropomorphic behaviour of the proposed architecture for reaching and tracking tasks in presence of spatial perturbations has been validated over a real arm-head robotic platform.
Statistics is the science of data analysis. The data to be encountered in this book are derived from genomes. Genomes consist of long chains of DNA which are represented by sequences in the letters A, C, G or T. These abbreviate the four nucleic acids Adenine, Cytosine, Guanine and Thymine, which serve as fundamental building blocks in molecular biology.
What do statisticians do with their data? They build models of the process that generated the data and, in what is known as statistical inference, draw conclusions about this process. Genome sequences are particularly interesting data to draw conclusions from: they are the blueprint for life, and yet their function, structure, and evolution are poorly understood. Statistical models are fundamental for genomics, a point of view that was emphasized in [Durbin et al., 1998].
The inference tools we present in this chapter look different from those found in [Durbin et al., 1998], or most other texts on computational biology or mathematical statistics: ours are written in the language of abstract algebra. The algebraic language for statistics clarifies many of the ideas central to the analysis of discrete data, and, within the context of biological sequence analysis, unifies the main ingredients of many widely used algorithms.
Algebraic Statistics is a new field, less than a decade old, whose precise scope is still emerging. The term itself was coined by Giovanni Pistone, Eva Riccomagno and Henry Wynn, with the title of their book [Pistone et al., 2000].
The title of this book reflects who we are: a computational biologist and an algebraist who share a common interest in statistics. Our collaboration sprang from the desire to find a mathematical language for discussing biological sequence analysis, with the initial impetus being provided by the introductory workshop on Discrete and Computational Geometry at the Mathematical Sciences Research Institute (MSRI) held at Berkeley in August 2003. At that workshop we began exploring the similarities between tropical matrix multiplication and the Viterbi algorithm for hidden Markov models. Our discussions ultimately led to two articles [Pachter and Sturmfels, 2004a,b] which are explained and further developed in various chapters of this book.
In the fall of 2003 we held a graduate seminar on The Mathematics of Phylogenetic Trees. About half of the authors of the second part of this book participated in that seminar. It was based on topics from the books [Felsenstein, 2003, Semple and Steel, 2003] but we also discussed other projects, such as Michael Joswig's polytope propagation on graphs (now Chapter 6). That seminar got us up to speed on research topics in phylogenetics, and led us to participate in the conference on Phylogenetic Combinatorics which was held in July 2004 in Uppsala, Sweden. In Uppsala we were introduced to David Bryant and his statistical models for split systems (now Chapter 17).
Another milestone was the workshop on Computational Algebraic Statistics, held at the American Institute for Mathematics (AIM) at Palo Alto in December 2003.
Many of the algorithms used for biological sequence analysis are discrete algorithms, i.e., the key feature of the problems being solved is that some optimization needs to be performed on a finite set. Discrete algorithms are complementary to numerical algorithms, such as Expectation Maximization, Singular Value Decomposition and Interval Arithmetic, which make their appearance in later chapters. They are also distinct from algebraic algorithms, such as the Buchberger Algorithm, which is discussed in Section 3.1. In what follows we introduce discrete algorithms and mathematical concepts which are relevant for biological sequence analysis. The final section of this chapter offers an annotated list of the computer programs which are used throughout the book. The list ranges over all three themes (discrete, algebraic, numerical) and includes software tools which are useful for research in computational biology.
Some discrete algorithms arise naturally from algebraic statistical models, which are characterized by finitely many polynomials, each with finitely many terms. Inference methods for drawing conclusions about missing or hidden data depend on the combinatorial structure of the polynomials in the algebraic representation of the models. In fact, many widely used dynamic programming methods, such as the Needleman–Wunsch algorithm for sequence alignment, can be interpreted as evaluating polynomials, albeit with tropical arithmetic.
The combinatorial structure of a polynomial, or polynomial map, is encoded in its Newton polytope. Thus every algebraic statistical model has a Newton polytope, and it is the structure of this polytope which governs dynamic programming related to that model.
When statistical inference is conducted in a maximum likelihood (ML) framework as discussed in Chapter 1, we are interested in the global maximum of the likelihood function over the parameter space. In practice we settle for a local optimization algorithm to numerically approximate the global solution since explicit analytical solutions for the maximum likelihood estimates (MLEs) are typically difficult to obtain. See Chapter 3 or 18 for algebraic approaches to solving such ML problems. In this chapter we will take a rigorous numerical approach to the ML problem for phylogenetic trees via interval methods. We accomplish this by first constructing an interval extension of the recursive formulation for the likelihood function of an evolutionary model on an unrooted tree. We then we use an adaptation of a widely applied global optimization algorithm using interval analysis for the phylogenetic context to rigorously enclose ML values as well as MLEs for branch lengths. The method is applied to enclose the most likely 2- and 3-taxa trees under the Jukes–Cantor model of DNA evolution. The method is general and can provide rigorous estimates when coupled with standard phylogenetic algorithms. Solutions obtained with such methods are equivalent to computer-aided proofs, unlike solutions obtained with conventional numerical methods.
Statistical inference procedures that obtain MLEs through conventional numerical methods may suffer from several major sources of errors. To fully appreciate the sources of errors we need some understanding of a number screen.
Mutagenetic trees are a class of graphical models designed for accumulative evolutionary processes. The state spaces of these models form finite distributive lattices. Using this combinatorial structure, we determine the algebraic invariants of mutagenetic trees. We further discuss the geometry of mixture models. In particular, models resulting from mixing a single tree with an error model are shown to be identifiable.
Accumulative evolutionary processes
Some evolutionary processes can be described as the accumulation of non-reversible genetic changes. For example, the process of tumor development of several cancer types starts from the set of complete chromosomes and is characterized by the subsequent accumulation of chromosomal gains and losses, or by losses of heterozygosity [Vogelstein et al., 1988, Zang, 2001]. Mutagenetic trees, sometimes also called oncogenetic trees, have been applied to model tumor development in patients with different types of cancer, such as renal cancer [Desper et al., 1999, von Heydebreck et al., 2004], melanoma [Radmacher et al., 2001] and ovarian adenocarcinoma [Simon et al., 2000]. For glioblastoma and prostate cancer, tumor progression along the mutagenetic tree has been shown to be an independent cytogenetic marker of patient survival [Rahnenführer et al., 2005].
Amino acid substitutions in proteins may also be modeled as permanent under certain conditions, such as a very strong selective pressure. For example, the evolution of human immunodeficiency virus (HIV) under antiviral drug therapy exhibits this behavior.
One of the most frequently used techniques in determining the similarity between biological sequences is optimal sequence alignment. In the standard instance of the sequence alignment problem, we are given two sequences (usually DNA or protein sequences) that have evolved from a common ancestor via a series of mutations, insertions and deletions. The goal is to find the best alignment between the two sequences. The definition of “best” here depends on the choice of scoring scheme, and there is often disagreement about the correct choice. In parametric sequence alignment, this problem is circumvented by instead computing the optimal alignment as a function of variable scores. In this chapter, we address one such scheme, in which all matches are equally rewarded, all mismatches are equally penalized and all spaces are equally penalized. An efficient parametric sequence alignment algorithm is described in Chapter 7. Here we will address the structure of the set of different alignments, and in particular the number of different alignments of two given sequences which can be optimal. For a detailed treatment on the subject of sequence alignment, we refer the reader to [Gusfield, 1997].
Alignments and optimality
We first review some notation from Section 2.2. In this chapter, all alignments will be global alignments between two sequences σ1 and σ2 of the same length, denoted by n.
The philosophy of algebraic statistics is that statistical models are algebraic varieties. We encountered many such models in Chapter 1. The purpose of this chapter is to give an elementary introduction to the relevant algebraic concepts, with examples drawn from statistics and computational biology.
Algebraic varieties are zero sets of systems of polynomial equations in several unknowns. These geometric objects appear in many contexts. For example, in genetics, the familiar Hardy–Weinberg curve is an algebraic variety (see Figure 3.1). In statistics, the distributions corresponding to independent random variables form algebraic varieties, called Segre varieties, that are well known to mathematicians. There are many questions one can ask about a system of polynomial equations; for example whether the solution set is empty, nonempty but finite, or infinite. Gröbner bases can be used to answer these questions.
Algebraic varieties can be described in two different ways, either by equations or parametrically. Each of these representations is useful. We encountered this duality in the Hammersley–Clifford Theorem which says that a graphical model can be described by conditional independence statements or by a polynomial parameterization. Clearly, methods for switching between these two representations are desirable. We discuss such methods in Section 3.2.
The study of systems of polynomial equations is the main focus of a central area in mathematics called algebraic geometry. This is a rich, beautiful, and well-developed subject, at whose heart lies a deep connection between algebra and geometry.
In this chapter we present a probabilistic approach to the homology mapping problem. This is the problem of identifying regions among genomic sequences that diverged from the same region in a common ancestor. We explore this question as a combinatorial optimization problem, seeking the best assignment of labels to the nodes in a Markov random field. The general problem is formulated using toric models, for which it is unfortunately intractable to find an exact solution. However, for a relevant subclass of models, we find a (non-integer) linear programming formulation that gives us the exact integer solution in polynomial time in the size of the problem. It is encouraging that for a useful subclass of toric models, maximum a posteriori inference is tractable.
Genome mapping
Evolutionary divergence gives rise to different present-day genomes that are related by shared ancestry. Evolutionary events occur at varying rates, but also at different scales of genomic regions. Local mutation events (for instance, the point mutations, insertions and deletions discussed in Section 4.5) occur at the level of one or several base-pairs. Large-scale mutations can occur at the level of single or multiple genes, chromosomes, or even an entire genome. Some of these mutation mechanisms such as rearrangement and duplication, were briefly introduced in Section 4.1. As a result, regions in two different genomes could be tied to a single region in the ancestral genome, linked by a series of mutational events.
As discussed in Chapter 1, the EM algorithm is an iterative procedure used to obtain maximum likelihood estimates (MLEs) for the parameters of statistical models which are induced by a hidden variable construct, such as the hidden Markov model (HMM). The tree structure underlying the HMM allows us to organize the required computations efficiently, which leads to an efficient implementation of the EM algorithm for HMMs known as the Baum–Welch algorithm. For several examples of two-state HMMs with binary output we plot the likelihood function and relate the paths taken by the EM algorithm to the gradient of the likelihood function.
The EM algorithm for hidden Markov models
The hidden Markov model is obtained from the fully observed Markov model by marginalization; see Sections 1.4.2 and 1.4.3. We will use the same notation as there, so σ = σ1σ2 … σn ∈ Σn is a sequence of states and τ = τ1τ2 … τn ∈ (Σ′)n a sequence of output variables. We assume that we observe N sequences, τ1, τ2, …, τN ∈ (Σ′)n, each of length n but that the corresponding state sequences, σ1, σ2, …, σN ∈ Σn, are not observed (hidden).
In Section 1.4.2 it is assumed that there is a uniform distribution on the first state in each sequence, i.e., Prob(σ1 = r) = 1/l for each r ∈ Σ where l = |Σ|.