Skip to main content
×
Home
    • Aa
    • Aa
  • Combinatorics, Probability and Computing, Volume 5, Issue 4
  • December 1996, pp. 415-427

Fast String Matching in Stationary Ergodic Sources

  • John Shawe-Taylor (a1)
  • DOI: http://dx.doi.org/10.1017/S0963548300002169
  • Published online: 01 September 2008
Abstract

A connection is made between the theory of ergodicity and the expected complexity of string searching. In particular, a substring search algorithm is introduced which, when applied to searching in text that has been produced by an appropriate stationary ergodic source, has an expected running time of O((N/m + m)logm), for a text string of length N and search string of length m. Similar expected complexity results have been obtained before, but the analysis is performed in a significantly more general framework, which models with greater accuracy the statistics of many types of strings, including natural language. The analysis also sheds light on the performance of the Boyer-Moore algorithm and the Sunday algorithm when applied to natural language.

Copyright
Linked references
Hide All

This list contains references from the content that can be linked to their source. For a full set of references and notes please see the PDF or HTML where available.

[1]D. E. Knuth , J. H. Morris and V. R. Pratt (1977) Fast pattern matching in strings. SIAM. J. Comput. 6 323350.

[2]G. V. Smit (1982) A comparison of three string matching algorithms. Software - Practice & Experience 12 5766.

[3]D. M. Sunday (1990) A very fast substring search algorithm. Comm. ACM 33 132142.

[4]R. S. Boyer and J. S. Moore (1977) A fast string searching algorithm. Comm. ACM 20 762772.

[5]L. J. Guibas and A. M. Odlyzko (1980) A new proof of the linearity of the Boyer-Moore string searching algorithm. SIAM J. Comput. 9 672682.

[6]R. A. Baeza-Yates (1989) String searching algorithms revisited. Proc. Workshop in Algorithms and Data Structures. Lecture Notes in Computer Science 382, pp. 7596. Springer-Verlag.

[8]R. Schaback (1988) On the expected sublinearity of the Boyer-Moore algorithm. SIAM J. Comput. 17 648658.

[9]A. C-C. Yao (1979) The complexity of pattern matching for a random string. SIAM J. Comput. 8 368387.

[10]C. E. Shannon (1948) A mathematical theory of communication. Bell. Syst. Tech. J. 27 379423, 623656.

[11]J. Y. Kim and J. S. Shawe-Taylor (1994) Fast expected string matching using an n-gram algorithm. Software - Practice & Experience 24 7988.

[14]A. J. Thomasian (1960) An elementary proof of the AEP of information theory. Ann. Math. Statist. 31 452456.

[15]J. Y. Kim and J. S. Shawe-Taylor (1992) An approximate string matching algorithm. Theor. Comput. Sci. 92 107117.

Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Combinatorics, Probability and Computing
  • ISSN: 0963-5483
  • EISSN: 1469-2163
  • URL: /core/journals/combinatorics-probability-and-computing
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×