Book contents
- Frontmatter
- Contents
- Preface
- I Exact String Matching: The Fundamental String Problem
- 1 Exact Matching: Fundamental Preprocessing and First Algorithms
- 2 Exact Matching: Classical Comparison-Based Methods
- 3 Exact Matching: A Deeper Look at Classical Methods
- 4 Seminumerical String Matching
- II Suffix Trees and Their Uses
- III Inexact Matching, Sequence Alignment, Dynamic Programming
- IV Currents, Cousins, and Cameos
- Epilogue – where next?
- Bibliography
- Glossary
- Index
4 - Seminumerical String Matching
from I - Exact String Matching: The Fundamental String Problem
Published online by Cambridge University Press: 23 June 2010
- Frontmatter
- Contents
- Preface
- I Exact String Matching: The Fundamental String Problem
- 1 Exact Matching: Fundamental Preprocessing and First Algorithms
- 2 Exact Matching: Classical Comparison-Based Methods
- 3 Exact Matching: A Deeper Look at Classical Methods
- 4 Seminumerical String Matching
- II Suffix Trees and Their Uses
- III Inexact Matching, Sequence Alignment, Dynamic Programming
- IV Currents, Cousins, and Cameos
- Epilogue – where next?
- Bibliography
- Glossary
- Index
Summary
Arithmetic versus comparison-based methods
All of the exact matching methods in the first three chapters, as well as most of the methods that have yet to be discussed in this book, are examples of comparison-based methods. The main primitive operation in each of those methods is the comparison of two characters. There are, however, string matching methods based on bit operations or on arithmetic, rather than character comparisons. These methods therefore have a very different flavor than the comparison-based approaches, even though one can sometimes see character comparisons hidden at the inner level of these “seminumerical” methods. We will discuss three examples of this approach: the Shift-And method and its extension to a program called agrep to handle inexact matching; the use of the Fast Fourier Transform in string matching; and the random fingerprint method of Karp and Rabin.
The Shift-And method
R. Baeza-Yates and G. Gonnet [35] devised a simple, bit-oriented method that solves the exact matching problem very efficiently for relatively small patterns (the length of a typical English word for example). They call this method the Shift-Or method, but it seems more natural to call it Shift-And. Recall that pattern P is of size n and the text T is of size m.
Definition Let M be an n by m + 1 binary valued array, with index i running from 1 to n and index j running from 1 to m.
- Type
- Chapter
- Information
- Algorithms on Strings, Trees, and SequencesComputer Science and Computational Biology, pp. 70 - 86Publisher: Cambridge University PressPrint publication year: 1997