To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This article derives amazingly accurate approximations to the state probabilities and waiting-time probabilities in the M/D/1 queue using a two-phase process with negative probabilities to approximate the deterministic service time. The approximations are in the form of explicit expressions involving geometric and exponential terms. The approximations extend to the finite-capacity M/D/1/N + 1 queue.
Motivated by hydrological applications, the exact distributions of R = X + Y, P = XY, and W = X/(X + Y) and the corresponding moment properties are derived when X and Y follow Block and Basu's bivariate exponential distribution. An application of the results is provided to drought data from Nebraska.
Let S denote the collection of all finite subsets of . We define an operation on S that makes S into a positive semigroup with set inclusion as the associated partial order. Positive semigroups are the natural home for probability distributions with exponential properties, such as the memoryless and constant rate properties. We show that there are no exponential distributions on S, but that S can be partitioned into subsemigroups, each of which supports a one-parameter family of exponential distributions. We then find the distribution on S that is closest to exponential, in a certain sense. This work might have applications to the problem of selecting a finite sample from a countably infinite population in the most random way.
In May of 2003 it was announced that Lee Rowen of the Institute for Systems Biology in Seattle, Washington was the winner of GeneSweep, an informal betting pool on the number of genes contained in the human genome. Rowen's guess of 25,947 won her half of the $1200 pool and a signed copy of James Watson's book, The Double Helix. GeneSweep had been created in 2000 by Ewan Birney of the European Bioinformatics Institute just as large pieces of the genome were being completed; because of the increasing amount of sequence becoming available, the cost of bets rose from $1 in 2000, to $5 in 2001, to $20 in 2002. One of the most surprising things about Rowen's winning guess was that it was almost certainly 3,000 genes off the mark – above the true number of genes! Researchers had placed wagers on figures as high as 300,000 genes, with only three sub-30,000 guesses. This number of genes put humans below the two plants that have been sequenced and barely above the worm, C. elegans.
Genes and proteins
Gene finding and sequences
Statistical hypothesis testing
Though the draft sequence of the human genome was published in 2001, nailing down exactly how many genes it contained turned out to be a tricky proposition.
On February 28, 2003 the Vietnam French Hospital of Hanoi, a private hospital with only 60 beds, called the World Health Organization (WHO) with a report of patients who had unusual influenza-like symptoms. Hospital officials had seen an avian influenza virus pass through the region a few years earlier and suspected a similar virus. The pathogen seemed highly contagious and highly virulent, so they asked that someone from the WHO be sent to investigate. Dr. Carlo Urbani, an Italian specialist in infectious diseases, responded.
Phylogenetic trees
The neighbor-joining algorithm
The Newick format for representing trees
Dr. Urbani quickly determined that the Vietnamese hospital was facing a new and unusual pathogen. The infections he observed were characterized by a fever, dry cough, shortness of breath, and progressively worsening respiratory problems. Death from respiratory failure occurred in a significant fraction of the infected patients. For the next several days, Dr. Urbani worked at the hospital documenting findings, collecting samples, and organizing patient quarantine. He was the first person to identify and describe the new disease, called Severe Acute Respiratory Syndrome, or SARS. In a matter of weeks, Dr. Urbani and five other healthcare professionals from the hospital would be dead.
By March 15 the WHO had already issued a global alert, calling SARS a “worldwide health threat.” They warned that possible cases had been identified in Canada, Indonesia, Philippines, Singapore, Thailand, and Vietnam.
The physicist Richard Feynman is credited with jump-starting the field of nanotechnology. In a talk at Caltech in December 1959, Feynman issued a famous challenge: he would pay $1000 to anyone who could write the entire Encyclopedia Britannica on the head of a pin. Feynman calculated that the size of the area was approximately 1/16 of an inch across (about 1.6 × 10−3 meters), and that in order to fit all 42 million letters of the Encyclopedia one would have to make each letter 1.0 × 10−8 meters across. It took (only) 26 years before the prize was finally claimed by a graduate student at Stanford University.
How cells work
What is a genome
The computational future of biology
A roadmap to this book
Now, consider the problem of having to write out the entire set of instructions needed to build and operate a human, and consider having to do so in each of the trillions of cells in the body. The entire human genome is 3.5 billion “letters” long, and each cell is only 2 microns (2 × 10−7 meters) across. (Actually, two complete copies of the genome are present in each cell, so we have to fit a bit more than 7 billion letters.) However all the organisms on earth overcome these packaging problems to live and prosper in a wide range of environments.
In the same 1959 lecture Feynman also imagined being able to look inside a cell in order to read all of the instructions and history contained within a genome.
In the spring of 1979 the Centers for Disease Control in the United States received reports of an unknown disease that affected young men and produced a wide range of symptoms, including rare forms of cancer. In 1981 the disease was named Acquired Immune Deficiency Syndrome (AIDS). It was recognized that transmission of this disease was largely sexual, but it was not until 1983 that the infectious agent – Human Immunodeficiency Virus (HIV) – was “simultaneously” identified by labs in France and the US (the sordid story of this inter-continental competition has been the subject of multiple books). Since the first cases were identified, 20 million people have died from AIDS worldwide.
The neutral theory of evolution
Substitution rates
KA/KS: quantifying the amount of selection on a sequence
At present there is no known cure for this disease and no effective vaccine against HIV infection. Large parts of the world are now facing an AIDS epidemic, with some African nations counting more than 60% of their population among the affected. Although methods exist to keep the virus in check, the high cost of these treatments means that most infected individuals in the developing world will die from AIDS. Indeed, this disease has now surpassed malaria as the number one killer in Africa.
Various aspects of the AIDS epidemic have caught scientists by surprise, including its sudden appearance, mysterious origin, and the difficulty in finding a cure or vaccine.
Every human being has multiple species of bacteria living within them. Most of these bacteria, such as E. coli, are not harmful to us and are considered beneficial symbionts. The bacteria help us to digest certain foods or supply us with vitamins that we cannot make on our own, and we provide them with the nutrients and environment they need to survive. The benefits of having bacterial symbionts extend beyond the production of necessary chemicals. These bacteria actually prevent infection by other pathogenic bacteria simply by virtue of having already established their presence in our gut. In fact, after taking a course of antibiotics it is often recommended that people eat foods like yogurt to re-populate their stomach and intestines with symbiotic bacteria.
Genome rearrangements
Orthology and paralogy
Synteny blocks, inversions, and transpositions
Although there are many well-known examples of beneficial symbiotic relationships in nature, they tend to be interactions between free-living organisms such as bees and flowers or birds and rhinoceros (the appropriately named tick-bird eats ticks off the back of the rhino). As with our relationship with E. coli, however, there are many symbionts that are hidden from view because they live within their hosts. These symbionts often aid their hosts in digestion and benefit themselves by living in the stomach of a mobile organism. For instance, termites have specific protozoa (a type of single-celled eukaryote) living in their digestive tract to help with the digestion of wood.
In 1856, workers involved in limestone blasting operations near Düsseldorf, Germany, in the Neander Thal (Neander Valley) discovered a strange human skeleton. The skeleton had very unusual features, including a heavy browridge, a large nose, receding chin, and stocky build. Initially neglected, the importance of the finding was recognized only many years later by the Irish anatomist William King. The skeleton belonged to an ancient species of hominid biologically different from modern humans. King called the specimen Neanderthal Man: man of the Neander Valley. The skeleton was dated to about 44 thousand years ago.
Mutations and substitutions
Genetic distance
Statistical estimations: Kimura, Jukes-Cantor
Since then, many other skeletons of the same species, H. neanderthalensis, have been discovered in Europe. Popular imagination has been captured by the image of these cavemen; the name itself has become a symbol of prehistoric humans. It has been possible to reconstruct the lifestyle of Neanderthals in prehistoric Europe based on the tools that they used, but one fundamental question remained: are Neanderthals our ancestors? Are modern Europeans the offspring of these primitive hominids? This question has divided scientists for decades, and has only been settled recently by genetic analysis.
Many other fundamental questions about human origins have been answered by modern genetics, as well as by recent fossil discoveries. The search for the oldest hominid fossils has continually revealed evidence that humans originated in sub-Saharan Africa.
The Nobel Prize in Physiology or Medicine in 2004 went to Richard Axel of Columbia University and Linda Buck of the Fred Hutchinson Cancer Research Center for their elucidation of the olfactory system. The olfactory system is responsible for our sense of smell: it includes a large family of proteins called odorant receptors that in combination make it possible to recognize over 10,000 different odors. These odorant receptors are attached to the surface of cells in our nasal passage, detecting odorant molecules as they are inhaled and passing the information along to the brain.
Gene families
Hidden Markov models
Sequence segmentation
Multiple alignment
In order for odorant receptors (ORs) to both sense molecules outside of the cell and to signal the inside of the cell of their discoveries, these proteins must traverse the cell membrane. To do this, odorant receptors contain seven transmembrane domains: stretches of highly hydrophobic amino acids that interact with the fatty cell membrane. The seven transmembrane domains result in a highly heterogeneous protein sequence: alternating stretches of hydrophobic and hydrophilic amino acids that mark the function of receptor proteins. Axel and Buck's discovery led to the further description of similar receptors involved in the sense of taste and in the detection of pheromones, chemicals used in signaling between organisms.
Nothing in biology makes sense except in the light of evolution.
Theodosius Dobzhansky
Modern biology is undergoing an historical transformation, becoming – among other things – increasingly data driven. A combination of statistical, computational, and biological methods has become the norm in modern genomic research. Of course this is at odds with the standard organization of university curricula, which typically focus on only one of these three subjects. It is hard enough to provide a good synthesis of computer science and statistics, let alone to include molecular biology! Yet, the importance of the algorithms typical of this field can only be appreciated within their biological context, their results can only be interpreted within a statistical framework, and a basic knowledge of all three areas is a necessary condition for any research project.
We believe that users of software should know something about the algorithms behind the results that are presented, and software designers should know something about the problems that will be attacked with their tools. We also believe that scientific ideas need to be understood within their context, and are often best communicated to students by means of examples and case studies.
This book addresses just that need: providing a rigorous yet accessible introduction to this interdisciplinary field, one that can be read by both biologically and computationally minded students, and that is based on case studies.
As you step off a trans-oceanic flight into the midday bustle of an airport, your body may be telling you that it's time for bed. This is because our body's sense of time depends as much on an internal clock as it does on external cues. Our internal clock – known as the circadian clock – will eventually synchronize itself with the new day–night cycle, but not before we suffer through the mind-deadening effects of jet lag. Reestablishing a link between the external clock (the sun) and our internal clock is essential for human health. Disruption of circadian rhythms has been linked to mania in people with bipolar disorder, and various health problems manifest themselves more often during the morning (heart attacks) or at night (asthma attacks) depending on our internal clock.
Regulatory regions and sequence motifs
Motif finding algorithms
Combining expression and sequence data
The circadian clock is fundamental to many organisms. Bacteria, insects, fungi, mammals, and many other species maintain an internal clock in order to synchronize their metabolism, activity, and body temperature to the sun. In no other organism is the ability to keep time as important as it is in plants. Much more than in mobile species, plants depend on a steady day–night cycle for energy production: they are able to photosynthesize sunlight during the day to store energy, but must use up these stores at night.
In 1994, at the same time the genomic era was beginning, Walter Gehring and colleagues at the University of Basel carried out a Frankenstein experiment par excellence: they were able to turn on a gene called eyeless in various places on the body of the fruitfly, Drosophila melanogaster. The result was amazing – fruitflies that had whole eyes sprouting up all over their bodies. Scientists refer to genes such as eyeless as master regulatory genes (note that genes are often named after the problems they cause when mutated). These master regulatory genes produce proteins that control large cascades of other genes, like those needed to produce complex features such as eyes; eyeless controls one such cascade that contains more than 2000 other genes. Turning it on anywhere in the body activates the cascade and produces a fully formed, but non-functioning, eye.
Sequence similarity and homology
Global and local alignments
Statistical significance of alignments
BLAST and CLUSTAL
It turns out that all multicellular organisms use master regulatory genes, often for the same purpose in different species. Slightly different versions of the eyeless gene are used in humans, mice, sea squirts, squids, and, yes, tigers, to control eye formation. We call these different versions of the same gene homologs, to denote their shared ancestry from a common ancestor.
In 1995 a group of scientists led by Craig Venter, at The Institute for Genomic Research (TIGR) in Maryland, published a landmark paper in the journal Science. This paper reported the complete DNA sequence (the genome) of a free-living organism, the bacterium Haemophilus influenzae (or H. influenzae, for short). Up until that moment, only small viral genomes or small parts of other genomes had been sequenced. The first viral genome sequence (that of phage phiX174) was produced by Fred Sanger's group in 1978, followed a few years later by the sequence of human mitochondrial DNA by the same group. Sanger – working in Cambridge, UK – was awarded two Nobel prizes, the first one in 1958 for developing protein sequencing techniques and the second one in 1980 for developing DNA sequencing techniques. A bacterial sequence, however, is enormously larger than a viral one, making the H. influenzae paper a true milestone. Given the order of magnitude increase in genome size that was sequenced by the group at TIGR, the genomic era can be said to have started in 1995.
Genomes and genomic sequences
Probabilistic models of sequences
Statistical properties of sequences
Standard data formats and databases
A few months later the same group at TIGR published an analysis of the full genome of another bacterium, Mycoplasma genitalium – a microbe responsible for urethritis – and shortly thereafter the sequence of the first eukaryote, the fungus, Saccharomyces cerevisiae (or S. cerevisiae, baker's yeast) was published by other groups.
In 1985 the world's most expensive bottle of wine was sold at auction for $160,000. This bottle of 1787 Chateau Lafite came from the cellar of Thomas Jefferson (third president of the United States) and was apparently purchased during his time as ambassador to France. While Jefferson's Bordeaux is undoubtedly an historic artifact (and probably undrinkable), the oldest bottle of wine dates to 5000 BC from the site of Hajji Feruz Tepe in Iran. This 9-liter clay pot did not contain any liquid when found, but still had dried residue from the wine it once held.
Gene expression data
Types of DNA microarrays
Data clustering and visualization
Expression during the cell Cycle
The recipe for making wine and other fermented beverages has not changed much in the past 7000 years. A solution rich in sugars (usually fruit juice) is turned into the alcoholic nectar we consume by exploiting a remarkable organism: the yeast, Saccharomyces cerevisiae. This unicellular fungus extracts energy from the environment by fermenting sugars, a process which produces alcohol as a by-product. Because S. cerevisiae is found naturally on grapevines, wine making is as easy as crushing grapes and putting them into a tightly sealed container for a few months. During this time yeast transforms the sugars contained in grape juice into alcohol, and this is why your wine is so much less sweet (and more alcoholic) than the grape juice it started from.