Hostname: page-component-5db58dd55d-8mwbx Total loading time: 0 Render date: 2026-06-02T01:26:56.800Z Has data issue: false hasContentIssue false

Fast String Matching in Stationary Ergodic Sources

Published online by Cambridge University Press:  12 September 2008

John Shawe-Taylor
Affiliation:
Department of Computer Science, Royal Holloway and Bedford New College, University of London, Egham, Surrey TW20 0EX, UK e-mail: john@dcs.rhbnc.ac.uk

Abstract

A connection is made between the theory of ergodicity and the expected complexity of string searching. In particular, a substring search algorithm is introduced which, when applied to searching in text that has been produced by an appropriate stationary ergodic source, has an expected running time of O((N/m + m)logm), for a text string of length N and search string of length m. Similar expected complexity results have been obtained before, but the analysis is performed in a significantly more general framework, which models with greater accuracy the statistics of many types of strings, including natural language. The analysis also sheds light on the performance of the Boyer-Moore algorithm and the Sunday algorithm when applied to natural language.

Information

Type
Research Article
Copyright
Copyright © Cambridge University Press 1996

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable