Skip to main content

Fast String Matching in Stationary Ergodic Sources

  • John Shawe-Taylor (a1)

A connection is made between the theory of ergodicity and the expected complexity of string searching. In particular, a substring search algorithm is introduced which, when applied to searching in text that has been produced by an appropriate stationary ergodic source, has an expected running time of O((N/m + m)logm), for a text string of length N and search string of length m. Similar expected complexity results have been obtained before, but the analysis is performed in a significantly more general framework, which models with greater accuracy the statistics of many types of strings, including natural language. The analysis also sheds light on the performance of the Boyer-Moore algorithm and the Sunday algorithm when applied to natural language.

Hide All
[1]Knuth D. E., Morris J. H. and Pratt V. R. (1977) Fast pattern matching in strings. SIAM. J. Comput. 6 323350.
[2]Smit G. V. (1982) A comparison of three string matching algorithms. Software - Practice & Experience 12 5766.
[3]Sunday D. M. (1990) A very fast substring search algorithm. Comm. ACM 33 132142.
[4]Boyer R. S. and Moore J. S. (1977) A fast string searching algorithm. Comm. ACM 20 762772.
[5]Guibas L. J. and Odlyzko A. M. (1980) A new proof of the linearity of the Boyer-Moore string searching algorithm. SIAM J. Comput. 9 672682.
[6]Baeza-Yates R. A. (1989) String searching algorithms revisited. Proc. Workshop in Algorithms and Data Structures. Lecture Notes in Computer Science 382, pp. 7596. Springer-Verlag.
[7]Horspool N. (1980) Practical fast searching in strings. Software - Practice & Experience 16 501506.
[8]Schaback R. (1988) On the expected sublinearity of the Boyer-Moore algorithm. SIAM J. Comput. 17 648658.
[9]Yao A. C-C. (1979) The complexity of pattern matching for a random string. SIAM J. Comput. 8 368387.
[10]Shannon C. E. (1948) A mathematical theory of communication. Bell. Syst. Tech. J. 27 379423, 623656.
[11]Kim J. Y. and Shawe-Taylor J. S. (1994) Fast expected string matching using an n-gram algorithm. Software - Practice & Experience 24 7988.
[12]Welsh D. (1988) Codes and Cryptography. Oxford University Press.
[13]Billingsley P. (1965) Ergodic Theory and Information. Wiley.
[14]Thomasian A. J. (1960) An elementary proof of the AEP of information theory. Ann. Math. Statist. 31 452456.
[15]Kim J. Y. and Shawe-Taylor J. S. (1992) An approximate string matching algorithm. Theor. Comput. Sci. 92 107117.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Combinatorics, Probability and Computing
  • ISSN: 0963-5483
  • EISSN: 1469-2163
  • URL: /core/journals/combinatorics-probability-and-computing
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Full text views

Total number of HTML views: 0
Total number of PDF views: 5 *
Loading metrics...

Abstract views

Total abstract views: 35 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 18th January 2018. This data will be updated every 24 hours.