To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Drosophila melanogaster resistance against the parasitoid wasp Leptopilina boulardi is under the control of a single gene (Rlb), with two alleles, the resistant one being dominant. Using strains bearing deletions, we previously demonstrated that the 55E2–E6; 55F3 region on chromosome 2R is involved in the resistance phenomenon. In this paper, we first restricted the Rlb containing region by mapping at the molecular level the breakpoints of the Df(2R)Pc66, Df(2R)P34 and Df(2R)Pc4 deficiencies, using both chromosomal in situ hybridization and Southern analyses. The resistance gene was localized in a 100 kb fragment, predicted to contain about 10 different genes. Male recombination genetic experiments were then performed, leading to identification of two possible candidates for the Rlb gene. Potential involvement of one of this genes, edl/mae, is discussed.
The Drosophila nasuta subgroup of the immigrans species group is widely distributed throughout the South-East Asian region, consisting of morphologically similar species with varying degrees of reproductive isolation. Here, I report nucleotide variability data for five X-linked and two mtDNA loci in eight taxa from the nasuta subgroup, with deeper sampling from D. albomicans and its sister species D. nasuta. Phylogenetic relationships among these species vary among different genomic regions, and levels of genetic differentiation suggest that this species group diversified only about one million years ago. D. albomicans and D. nasuta share nucleotide polymorphisms and are distinguished by relatively few fixed differences. Patterns of genetic differentiation between this species pair are compatible with a simple isolation model with no gene flow. Nucleotide variability levels of species in the nasuta group are comparable to those in members of the melanogaster and pseudoobscura species groups, indicating effective population sizes on the order of several million. Population genetic analyses reveal that summaries of the frequency distribution of neutral polymorphisms in both D. albomicans and D. nasuta generally fit the assumptions of the standard neutral model. D. albomicans is of particular interest for evolutionary studies because of its recently formed neo-sex chromosomes, and our phylogenetic and population genetic analyses suggest that it might be an ideal model to study the very early stages of Y chromosome evolution.
Prior to cranial neural tube closure, the neural folds adopt a biconvex morphology which is thought to be due to expansion of the underlying mesenchyme. Dorso-lateral hinge points (DLHPs) then form, which allow the dorsal tips of the neural folds to ‘flip around’ resulting in apposition of the tips and facilitating subsequent fusion. Cranial closure is particularly prone to perturbation, leading to exencephaly in many mouse mutants and as a result of a variety of teratogenic influences. This may reflect mechanical tensions affecting the closing cranial neural folds. For example, the presence of ventral flexures of the body axis at the mid- and forebrain levels mechanically opposes the formation of DLHPs. Several processes have been implicated as important in overcoming these mechanical tensions, thereby assisting in cranial neural tube closure. These include contraction of actin microfilaments at the luminal surface of the neuroepithelium and apoptosis in the dorsal and dorsolateral neuroepithelium. The latter may act to increase flexibility in the dorsal neural folds, enhancing DLHP formation. Neural crest cells (NCC) originate in the dorsal tips of the neuroepithelium and undergo an epithelial-to-mesenchymal transition, allowing them to delaminate, exit the neuroepithelium and migrate extensively throughout the embryo to form numerous derivatives. We hypothesized that delamination of the NCC from the neuroepithelium may enhance the mechanical flexibility of the dorsal tips of the neural folds, allowing the ‘flip around’ event to occur.
We have increased the density of genetic markers on the Arabidopsis lyrata chromosomes AL6 and AL7 corresponding to the A. thaliana chromosome IV, in order to determine chromosome rearrangements between these two species, and to compare recombination fractions across the same intervals. We confirm the two rearrangements previously inferred (a reciprocal translocation and a large inversion, which we infer to be pericentric). By including markers around the centromere regions of A. thaliana chromosomes IV and V, we localize the AL6 centromere, and can localize the breakpoints of these chromosome rearrangements more precisely than previously. One translocation breakpoint was close to the centromere, and the other coincided with one end of the inversion, suggesting that a single event caused both rearrangements. At the resolution of our mapping, apart from these rearrangements, all other markers are in the same order in A. lyrata and A. thaliana. We could thus compare recombination rates in the two species. We found slightly higher values in A. thaliana, and a minimum estimate for regions not close to a centromere in A. lyrata is 4–5 centimorgans per megabase. The mapped region of AL7 includes the self-incompatibility loci (S-loci), and this region has been predicted to have lower recombination than elsewhere in the genome. We mapped 17 markers in a region of 1·23 Mb surrounding these loci, and compared the approximately 600 kb closest to the S-loci with the surrounding region of approximately the same size. There were significantly fewer recombination events in the closer than the more distant region, supporting the above prediction, but showing that the low recombination region is very limited in size.
Unbiased or upper limit estimates of the rate (U) of genomic mutations to mildly deleterious alleles are crucial in genetic and conservation studies and in human health care. However, only a few estimates of the lower bounds of U are available. We present a fairly robust estimation that yields an upper limit of U and a nearly unbiased estimate of the per generation fitness decline due to new deleterious mutations. We applied the approach to three species of the freshwater microcrustacean Daphnia and revealed that the upper limit of U for egg survivorship is 0·73 (SD=0·30) in 14 D. pulicaria populations. For the first four clutches, per generation decline in fecundity due to deleterious mutations ranged from 2·2% to 7·8% in 20 D. pulex populations and from 1·1% to 5·1% in 8 D. obtusa populations. These results indicate the mutation pressure is high in natural Daphnia populations. The approach investigated here provides a potential way to quickly and conveniently characterize U and per generation effects of deleterious genomic mutations on fitness or its important components such as fecundity.
The present work provides the first broad-scale screening of allozymes in the land snail Helix aspersa. By using overall information available on the distribution of genetic variation between 102 populations previously investigated, we expect to strengthen our knowledge on the spread of the invasive aspersa subspecies in the Western Mediterranean. We propose a new approach based on a centre-based clustering procedure to cluster populations into groups following rules of geographical proximity and genetic similarity. Assuming a stepping-stone model of diffusion, we apply a partitioning algorithm which clusters only populations that are geographically contiguous. The algorithm used, which is actually part of leading methods developed for analysing large microarray datasets, is that of the k-means. Its goal is to minimize the within-group variance. The spatial constraint is provided by a list of connections between localities deduced from a Delaunay network. After testing each optimal group for the presence of spatial arrangement in the genetic data, the inferred genetic structure was compared with partitions obtained from other methods published for defining homogeneous groups (i.e. the Monmonier and SAMOVA algorithms). Competing biogeographical scenarios inferred from the k-means procedure were then compared and discussed to shed more light on colonization routes taken by the species.
Strong sexual isolation exists between the closely related species Drosophila ananassae and D. pallidosa, but there is no obvious post-mating isolation; both sexes of the hybrids and their descendants appear to be completely viable and fertile. Strains exhibiting parthenogenesis have been derived from wild populations of both species. We intercrossed such strains and established iso-female lines after the second generation of parthenogenesis. These lines are clones, carrying homozygous chromosomes that are interspecific recombinants. We established 266 such isogenic lines and determined their genetic constitution by using chromosomal and molecular markers. Strong pseudo-linkage was seen between loci on the left arm of chromosome 2 and on the right arm of chromosome 3; the frequency of inheriting the two chromosome regions from the same species was significantly larger than expected. One possible cause of pseudo-linkage is female meiotic bias, so that chromosomes of the same species origin tend to be distributed to the same gamete. But this possibility is ruled out; backcross analysis indicated that the two chromosome regions segregated independently in female hybrids. The remaining possibility is elimination of low-fitness flies carrying the two chromosome regions from different species. Thus, genetic incompatibility was detected in the species pair for which no hybrid breakdown had previously been indicated. The ‘interspecific mosaic genome’ lines reported here will be useful for future research to identify genes involved in speciation and phenotypic evolution.
An interval quantitative trait locus (QTL) mapping method for complex polygenic diseases (as binary traits) showing QTL by environment interactions (QEI) was developed for outbred populations on a within-family basis. The main objectives, within the above context, were to investigate selection of genetic models and to compare liability or generalized interval mapping (GIM) and linear regression interval mapping (RIM) methods. Two different genetic models were used: one with main QTL and QEI effects (QEI model) and the other with only a main QTL effect (QTL model). Over 30 types of binary disease data as well as six types of continuous data were simulated and analysed by RIM and GIM. Using table values for significance testing, results show that RIM had an increased false detection rate (FDR) for testing interactions which was attributable to scale effects on the binary scale. GIM did not suffer from a high FDR for testing interactions. The use of empirical thresholds, which effectively means higher thresholds for RIM for testing interactions, could repair this increased FDR for RIM, but such empirical thresholds would have to be derived for each case because the amount of FDR depends on the incidence on the binary scale. RIM still suffered from higher biases (15–100% over- or under-estimation of true values) and high standard errors in QTL variance and location estimates than GIM for QEI models. Hence GIM is recommended for disease QTL mapping with QEI. In the presence of QEI, the model including QEI has more power (20–80% increase) to detect the QTL when the average QTL effect is small (in a situation where the model with a main QTL only is not too powerful). Top-down model selection is proposed in which a full test for QEI is conducted first and then the model is subsequently simplified. Methods and results will be applicable to human, plant and animal QTL mapping experiments.
In the BSA Chapter 3 we learned that a DP algorithm for pairwise sequence alignment allows a probabilistic interpretation. Indeed, the equivalent equations appear in the logarithmic form of the Viterbi algorithm for the hidden Markov model of a gapped sequence alignment. The hidden states of such a model, called a pair HMM, correspond to the alignment match, the x-gap, and the y-gap positions. The pair HMM state diagram is topologically similar to the diagram of the finite state machine (Durbin et al. (1998), Fig. 4.1), although the pair HMM parameters have clear probabilistic meanings. The optimal finite state machine alignment found by standard DP is equivalent to the most probable path through the pair HMM determined by the Viterbi algorithm. Both global and local optimal DP alignment algorithms have Viterbi counterparts for suitably defined HMMs. Interestingly, the HMM has an advantage over the finite state machine because the HMM can compute the full probability that sequences X and Y could be generated by a given pair HMM; thus, a probabilistic measure can be introduced to help establish evolutionary relationships. This full probabilistic model also defines (i) the posterior distribution over all possible alignments given sequences X and Y and (ii) the posterior probability that a particular symbol x of sequence X is aligned to a given symbol y of sequence Y. However, real biological sequences cannot be considered to be exact realizations of probabilistic models. This explains the difficulties met by the HMM based alignment methods for the similarity search (Durbin et al. (1998), Sect. 4.5), while more simplistic finite state machine methods perform sufficiently well.
The reader will quickly discover that the organization of this book was chosen to be parallel to the organization of Biological Sequence Analysis by Durbin et al. (1998). The first chapter of BSA contains an introduction to the fundamental notions of biological sequence analysis: sequence similarity, homology, sequence alignment, and the basic concepts of probabilistic modeling.
Finding these distinct concepts described back-to-back is surprising at first glance. However, let us recall several important bioinformatics questions. How could we construct a pairwise sequence alignment? How could we build an alignment of multiple sequences? How could we create a phylogenetic tree for several biological sequences? How could we predict an RNA secondary structure? None of these questions can be consistently addressed without use of probabilistic methods. The mathematical complexity of these methods ranges from basic theorems and formulas to sophisticated architectures of hidden Markov models and stochastic grammars able to grasp fine compositional characteristics of empirical biological sequences.
The explosive growth of biological sequence data created an excellent opportunity for the meaningful application of discrete probabilistic models. Perhaps, without much exaggeration, the implications of this new development could be compared with implications of the revolutionary use of calculus and differential equations for solving problems of classic mechanics in the eighteenth century.
The problems considered in this introductory chapter are concerned with the fundamental concepts that play an important role in biological sequence analysis: the maximum likelihood and the maximum a posteriori (Bayesian) estimation of the model parameters. These concepts are crucial for understanding statistical inference from experimental data and are impossible to introduce without notions of conditional, joint, and marginal probabilities.
Bioinformatics, an integral part of post-genomic biology, creates principles and ideas for computational analysis of biological sequences. These ideas facilitate the conversion of the flood of sequence data unleashed by the recent information explosion in biology into a continuous stream of discoveries. Not surprisingly, the new biology of the twenty-first century has attracted the interest of many talented university graduates with various backgrounds. Teaching bioinformatics to such a diverse audience presents a well-known challenge. The approach requiring students to advance their knowledge of computer programming and statistics prior to taking a comprehensive core course in bioinformatics has been accepted by many universities, including the Georgia Institute of Technology, Atlanta, USA.
In 1998, at the start of our graduate program, we selected the then recently published book Biological Sequence Analysis (BSA) by Richard Durbin, Anders Krogh, Sean R. Eddy, and Graeme Mitchison as a text for the core course in bioinformatics. Through the years, BSA, which describes the ideas of the major bioinformatic algorithms in a remarkably concise and consistent manner, has been widely adopted as a required text for bioinformatics courses at leading universities around the globe.
Many problems included in BSA as exercises for its readers have been repeatedly used for homeworks and tests. However, the detailed solutions to these problems have not been available. The absence of such a resource was noticed by students and teachers alike. The goal of this book, Problems and Solutions in Biological Sequence Analysis is to close this gap, extend the set of workable problems, and help its readers develop problem-solving skills that are vitally important for conducting successful research in the growing field of bioinformatics.
The theory described in Chapter 5 of BSA suggests that constructing the multiple alignment of several biological sequences should be a part of the algorithm of the profile HMM training. Such an iterative expectation maximization method is supposed to estimate parameters of the profile HMM from unaligned sequences by means of the construction of the multiple alignment in parallel with the HMM parameter estimation. The resulting alignment can be evoked at the last step of the algorithm via an optimal alignment of each individual sequence to the just built profile HMM. Nevertheless, since this impressive theoretical design meets many practical difficulties, discussed in great detail in BSA, it has not yet been implemented in its pure form as an efficient tool for multiple sequence alignment.
One of the major difficulties on the road to a universal and efficient multiple sequence alignment algorithm is as follows. Establishing a gold standard for a multiple sequence alignment that would help to distinguish a good alignment from a better one is difficult. Since both sequence and structure are evolving and the ancestral sequences and structures can be reconstructed only by theoretical means, it is impossible to verify experimentally either alignments or phylogenies. Nevertheless, a formal assignment of the alignment score immediately leads to the notion of the best alignment for a given set of sequences; however, the implications of a so defined optimal alignment have to be taken cautiously. There are several biologically motivated options for the score assignment. For instance, the sum-of-pairs score is computationally convenient and frequently used, but it has well known theoretical drawbacks (Durbin et al. (1998), p. 141).
Stochastic transformational grammars, particularly stochastic context-free grammars, turned out to be effective modeling tools for RNA sequence analysis. Two biologically interesting problems are the prediction of RNA secondary structure and the construction of multiple alignments of RNA families. Non-stochastic algorithms for the RNA secondary structure prediction were developed more than twenty years ago (by Nussinov et al. (1978) and by Zuker and Stiegler (1981)). Notably, the Nussinov algorithm could be immediately rewritten in SCFG terms as a version of the Cocke–Younger–Kasami (CYK) algorithm. The SCFG interpretation provides an insight into the probabilistic meaning of parameters of the original Nussinov algorithm and also suggests statistical procedures for parameter estimation. A similar translation into SCFG terms is possible for the Zuker algorithm.
Interestingly, equivalence between the non-probabilistic dynamic programming sequence alignment algorithm and the Viterbi algorithm for a pair HMM is analogous to equivalence between the non-probabilistic algorithm of RNA structure prediction and the CYK algorithm for a SCFG. There is also an analogy between the use of the profile HMM for alignment of multiple DNA or protein sequences and the use of the SCFG-based RNA structure profiles, called covariance models (CMs), for constructing structurally sound alignments of multiple RNAs. Furthermore, parameters of the covariance models could be derived by the inside–outside expectation maximization algorithm (compare with the simultaneous profile HMM parameter estimation and construction of multiple sequence alignment).