Fast circular dictionary-matching algorithm

TANVER ATHAR; CARL BARTON; WIDMER BLAND; JIA GAO; COSTAS S. ILIOPOULOS; CHANG LIU; SOLON P. PISSIS

doi:10.1017/S0960129515000134

Fast circular dictionary-matching algorithm

Published online by Cambridge University Press: 11 May 2015

TANVER ATHAR ,

CARL BARTON ,

WIDMER BLAND ,

JIA GAO ,

COSTAS S. ILIOPOULOS ,

CHANG LIU and

SOLON P. PISSIS

Show author details

TANVER ATHAR: Affiliation:
Department of Informatics, King's College London, London, UK
CARL BARTON: Affiliation:
Department of Informatics, King's College London, London, UK
WIDMER BLAND: Affiliation:
Department of Computing and Software, McMaster University, Hamilton, Canada
JIA GAO: Affiliation:
Department of Informatics, King's College London, London, UK
COSTAS S. ILIOPOULOS: Affiliation:
Department of Informatics, King's College London, London, UK
CHANG LIU: Affiliation:
Department of Informatics, King's College London, London, UK
SOLON P. PISSIS: Affiliation:
Department of Informatics, King's College London, London, UK

Article contents

Abstract
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Circular string matching is a problem which naturally arises in many contexts. It consists in finding all occurrences of the rotations of a pattern of length m in a text of length n. There exist optimal worst- and average-case algorithms for circular string matching. Here, we present a suboptimal average-case algorithm for circular string matching requiring time $\mathcal{O}$(n) and space $\mathcal{O}$(m). The importance of our contribution is underlined by the fact that the proposed algorithm can be easily adapted to deal with circular dictionary matching. In particular, we show how the circular dictionary-matching problem can be solved in average-case time $\mathcal{O}$(n + M) and space $\mathcal{O}$(M), where M is the total length of the dictionary patterns, assuming that the shortest pattern is sufficiently long. Moreover, the presented average-case algorithms and other worst-case approaches were also implemented. Experimental results, using real and synthetic data, demonstrate that the implementation of the presented algorithms can accelerate the computations by more than a factor of two compared to the corresponding implementation of other approaches.

Type: Paper
Information: Mathematical Structures in Computer Science , Volume 27 , Special Issue 2: Special Issue: XIV ICTCS , February 2017 , pp. 143 - 156

DOI: https://doi.org/10.1017/S0960129515000134 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright: Copyright © Cambridge University Press 2015

References

Aho, A. V. and Corasick, M. J. (1975). Efficient string matching: An aid to bibliographic search. Communications of the ACM 18 (6) 333–340.Google Scholar

Barton, C., Iliopoulos, C. S. and Pissis, S. P. (2013). Circular string matching revisited. In: Proceedings of the 4th Italian Conference on Theoretical Computer Science (ICTCS 2013) 200–205.Google Scholar

Barton, C., Iliopoulos, C. S. and Pissis, S. P. (2014). Fast algorithms for approximate circular string matching. Algorithms for Molecular Biology 9 (9). Available at http://www.almob.org/content/9/1/9.Google Scholar

Barton, C., Iliopoulos, C. S. and Pissis, S. P. (2015). Average-case optimal approximate circular string matching. In: Dediu, A.-H., Formenti, E., Martin-Vide, C. and Truthe, B. (eds.) Language and Automata Theory and Applications, Lecture Notes in Computer Science, volume 8977 Springer, Berlin 85–96.Google Scholar

Belazzougui, D. (2010). Succinct dictionary matching with no slowdown. In: Amir, A. and Parida, L. (eds.) Combinatorial Pattern Matching, Lecture Notes in Computer Science, volume 6129 Springer, Berlin 88–100.CrossRef Google Scholar

Chan, H., Hon, W., Lam, T. and Sadakane, K. (2007). Compressed indexes for dynamic text collections. ACM Transactions on Algorithms 3 (2). Available at http://dl.acm.org/citation.cfm?doid=1240233.1240244.Google Scholar

Chen, K., Huang, G. and Lee, R. C. (2013). Bit-parallel algorithms for exact circular string matching. Computer Journal 57 (5) 731–743.Google Scholar

Dori, S. and Landau, G. M. (2006). Construction of Aho Corasick automaton in linear time for integer alphabets. Information Processing Letters 98 (2) 66–72.Google Scholar

Fischer, J. (2011). Inducing the LCP-array. In: Dehne, F., Iacono, J. and Sack, J.-R. (eds.) Algorithms and Data Structures, Lecture Notes in Computer Science, volume 6844, Springer, Berlin 374–385.Google Scholar

Fischer, J. and Heun, V. (2011). Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM Journal on Computing 40 (2) 465–492.Google Scholar

Fredriksson, K. and Grabowski, S. (2009). Average-optimal string matching. Journal of Discrete Algorithms 7 (4) 579–594.Google Scholar

Frousios, K., Iliopoulos, C. S., Mouchard, L., Pissis, S. P. and Tischler, G. (2010). REAL: An efficient REad ALigner for next generation sequencing reads. In: Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology, BCB 10, USA, ACM 154–159.Google Scholar

Gusfield, D. (1997). Algorithms on Strings, Trees and Sequences, Cambridge University Press.Google Scholar

Hon, W., Ku, T., Shah, R. and Thankachan, S. V. (2013). Space-efficient construction algorithm for the circular suffix tree. In Fischer, J. and Sanders, P. (eds.) Combinatorial Pattern Matching, Lecture Notes in Computer Science, volume 7922, Springer, Berlin 142–152.Google Scholar

Hon, W., Ku, T., Shah, R., Thankachan, S. V. and Vitter, J. S. (2010). Faster compressed dictionary matching. In: Chavez, E. and Lonardi, S. (eds.) String Processing and Information Retrieval, Lecture Notes in Computer Science, volume 6393, Springer, Berlin 191–200.Google Scholar

Hon, W., Lu, C., Shah, R. and Thankachan, S. V. (2011). Succinct indexes for circular patterns. In Asano, T., Nakano, S.-I., Okamoto, Y. and Watanabe, O (eds.) Algorithms and Computation, Lecture Notes in Computer Science, volume 7074, Springer, Berlin 673–682.Google Scholar

Huynh, T. N. D., Hon, W., Lam, T. and Sung, W. (2006). Approximate string matching using compressed suffix arrays. Theoretical Computer Science 352 (1) 240–249.Google Scholar

Ilie, L., Navarro, G. and Tinta, L. (2010). The longest common extension problem revisited and applications to approximate string searching. Journal of Discrete Algorithms 8 (4) 418–428.Google Scholar

Iliopoulos, C. S. and Rahman, M. S. (2008). Indexing circular patterns. In: Nakano, S.-I. and Rahman, Md. S. (eds.) WALCOM: Algorithms and Computation, Lecture Notes in Computer Science, volume 4921, Springer, Berlin 46–57.Google Scholar

Lothaire, M. (ed.) (2005). Applied Combinatorics on Words, Cambridge University Press.Google Scholar

Manber, U. and Myers, E. W. (1993). Suffix arrays: A new method for on-line string searches. SIAM Journal on Computing 22 (5) 935–948.CrossRef Google Scholar

Nong, G., Zhang, S. and Chan, W. H. (2009). Linear suffix array construction by almost pure induced-sorting. In: Storer, J. A. and Marcellin, M. W. (eds.) Proceedings of the 2009 Data Compression Conference, DCC 09, Washington, DC, USA, IEEE Computer Society 193–202.Google Scholar

Rivest, R. (1976). Partial-match retrieval algorithms. SIAM Journal on Computing 5 (1) 19–50.Google Scholar

Smyth, B. (2003). Computing Patterns in Strings. Pearson, Addison-Wesley.Google Scholar

Weiner, P. (1973). Linear pattern matching algorithms. In: Proceedings of the 14th Annual Symposium on Switching and Automata Theory (SWAT 1973), IEEE Computer Society 1–11.Google Scholar

Wu, S. and Manber, U. (1992). Fast text searching: Allowing errors. Communications of the ACM 35 (10) 83–91.Google Scholar

Article contents

Fast circular dictionary-matching algorithm

Abstract

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests