Book contents
- Frontmatter
- Contents
- Preface
- I Exact String Matching: The Fundamental String Problem
- 1 Exact Matching: Fundamental Preprocessing and First Algorithms
- 2 Exact Matching: Classical Comparison-Based Methods
- 3 Exact Matching: A Deeper Look at Classical Methods
- 4 Seminumerical String Matching
- II Suffix Trees and Their Uses
- III Inexact Matching, Sequence Alignment, Dynamic Programming
- IV Currents, Cousins, and Cameos
- Epilogue – where next?
- Bibliography
- Glossary
- Index
3 - Exact Matching: A Deeper Look at Classical Methods
from I - Exact String Matching: The Fundamental String Problem
Published online by Cambridge University Press: 23 June 2010
- Frontmatter
- Contents
- Preface
- I Exact String Matching: The Fundamental String Problem
- 1 Exact Matching: Fundamental Preprocessing and First Algorithms
- 2 Exact Matching: Classical Comparison-Based Methods
- 3 Exact Matching: A Deeper Look at Classical Methods
- 4 Seminumerical String Matching
- II Suffix Trees and Their Uses
- III Inexact Matching, Sequence Alignment, Dynamic Programming
- IV Currents, Cousins, and Cameos
- Epilogue – where next?
- Bibliography
- Glossary
- Index
Summary
A Boyer–Moore variant with a “simple” linear time bound
Apostolico and Giancarlo [26] suggested a variant of the Boyer–Moore algorithm that allows a fairly simple proof of linear worst-case running time. With this variant, no character of T will ever be compared after it is first matched with any character of P. It is then immediate that the number of comparisons is at most 2m: Every comparison is either a match or a mismatch; there can only be m mismatches since each one results in a nonzero shift of P; and there can only be m matches since no character of T is compared again after it matches a character of P. We will also show that (in addition to the time for comparisons) the time taken for all the other work in this method is linear in m.
Given the history of very difficult and partial analyses of the Boyer–Moore algorithm, it is quite amazing that a close variant of the algorithm allows a simple linear time bound. We present here a further improvement of the Apostolico–Giancarlo idea, resulting in an algorithm that simulates exactly the shifts of the Boyer–Moore algorithm. The method therefore has all the rapid shifting advantages of the Boyer–Moore method as well as a simple linear worst-case time analysis.
Key ideas
Our version of the Apostolico–Giancarlo algorithm simulates the Boyer–Moore algorithm, finding exactly the same mismatches that Boyer–Moore would find and making exactly the same shifts.
- Type
- Chapter
- Information
- Algorithms on Strings, Trees, and SequencesComputer Science and Computational Biology, pp. 35 - 69Publisher: Cambridge University PressPrint publication year: 1997