String algorithms are a traditional area of study in computer science. In recent years their importance has grown dramatically with the huge increase of electronically stored text and of molecular sequence data (DNA or protein sequences) produced by various genome projects. This 1997 book is a general text on computer algorithms for string processing. In addition to pure computer science, the book contains extensive discussions on biological problems that are cast as string problems, and on methods developed to solve them. It emphasises the fundamental ideas and techniques central to today's applications. New approaches to this complex material simplify methods that up to now have been for the specialist alone. With over 400 exercises to reinforce the material and develop additional topics, the book is suitable as a text for graduate or advanced undergraduate students in computer science, computational biology, or bio-informatics. Its discussion of current algorithms and techniques also makes it a reference for professionals.
• A dual treatment of string algorithms in both computer science and molecular biology, treating both theory and applications • Over 400 exercises to reinforce presented material and to develop additional topics • Code available on-line for many of the presented algorithms
Part I. Exact String Matching: The Fundamental String Problem: 1. Exact matching: fundamental preprocessing and first algorithms; 2. Exact matching: classical comparison-based methods; 3. Exact matching: a deeper look at classical methods; 4. Semi-numerical string matching; Part II. Suffix Trees and their Uses: 5. Introduction to suffix trees; 6. Linear time construction of suffix trees; 7. First applications of suffix trees; 8. Constant time lowest common ancestor retrieval; 9. More applications of suffix trees; Part III. Inexact Matching, Sequence Alignment and Dynamic Programming: 10. The importance of (sub)sequence comparison in molecular biology; 11. Core string edits, alignments and dynamic programming; 12. Refining core string edits and alignments; 13. Extending the core problems; 14. Multiple string comparison: the Holy Grail; 15. Sequence database and their uses: the motherlode; Part IV. Currents, Cousins and Cameos: 16. Maps, mapping, sequencing and superstrings; 17. Strings and evolutionary trees; 18. Three short topics; 19. Models of genome-level mutations.
'The readers of this book will be serious programmers, but of course anybody working in bio-computing will find the book of immense practical, scientific and commercial importance … you should get the book, whether you want to do some string processing, fundamental computing research, or want to impress a biotech firm.' Harold Thimbleby, The Times Higher Education Supplement
'… could well be used as the basis for a graduate-level course, particularly as it contains over 400 exercises to reinforce presented material and to develop further topics. It is recommended most highly.' P. Gibbons, Zentralblatt für Mathematik