Hostname: page-component-8448b6f56d-t5pn6 Total loading time: 0 Render date: 2024-04-20T06:12:54.463Z Has data issue: false hasContentIssue false

Proof-directed program transformation: A functional account of efficient regular expression matching

Published online by Cambridge University Press:  24 May 2021

Department of Computer Science, University of Copenhagen, Denmark (e-mail:
Rights & Permissions [Opens in a new window]


Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

We show how to systematically derive an efficient regular expression (regex) matcher using a variety of program transformation techniques, but very little specialized formal language and automata theory. Starting from the standard specification of the set-theoretic semantics of regular expressions, we proceed via a continuation-based backtracking matcher, to a classical, table-driven state machine. All steps of the development are supported by self-contained (and machine-verified) equational correctness proofs.

Research Article
© The Author(s), 2021. Published by Cambridge University Press


Berry, G. & Sethi, R. (1986) From regular expressions to deterministic automata. Theor. Comput. Sci. 48, 117126.CrossRefGoogle Scholar
Brzozowski, J. A. (1962) Canonical regular expressions and minimal state graphs for definite events. In Proceedings of the Symposium on Mathematical Theory of Automata. MRI Symposia Series 12. Polytechnic Institute of Brooklyn, pp. 529561.Google Scholar
Consel, C. & Danvy, O. (1989) Partial evaluation of pattern matching in strings. Inform. Process. Lett. 30(2), 7986.CrossRefGoogle Scholar
Danvy, O. & Filinski, A. (1990) Abstracting control. In Proceedings of the 1990 ACM Conference on Lisp and Functional Programming, pp. 151–160.CrossRefGoogle Scholar
Filinski, A. (1999) Representing layered monads. In Proceedings of the 26th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 175–188.CrossRefGoogle Scholar
Frisch, A. & Cardelli, L. (2004) Greedy regular expression matching. In Automata, Languages and Programming: 31st International Colloquium, ICALP 2004. LNCS 3142, pp. 618–629.CrossRefGoogle Scholar
Frost, R. A. & Szydlowski, B. (1996) Memoizing purely functional top-down backtracking language processors. Sci. Comput. Program. 27(3), 263288.CrossRefGoogle Scholar
Harper, R. (1999) Proof-directed debugging. J. Function. Program. 9(4), 463469.CrossRefGoogle Scholar
Harper, R. (2012) Practical Foundations for Programming Languages. Cambridge University Press.CrossRefGoogle Scholar
Harper, R., Honsell, F. & Plotkin, G. (1993) A framework for defining logics. J. ACM 40(1), 143184.CrossRefGoogle Scholar
Hopcroft, J. E. (1971) An n log n algorithm for minimizing the states in a finite automaton. In The Theory of Machines and Computations, Kohavi, Z. (ed). Academic Press, pp. 189196.CrossRefGoogle Scholar
Jones, N. D. (1997) Computability and Complexity: From a Programming Perspective. MIT Press.CrossRefGoogle Scholar
Jones, N. D., Gomard, C. K. & Sestoft, P. (1993) Partial Evaluation and Automatic Program Generation. Prentice Hall International.Google Scholar
Knuth, D. E., Morris, J. H. & Pratt, V. R. (1977) Fast pattern matching in strings. SIAM J. Comput. 6(2), 323350.CrossRefGoogle Scholar
Milner, R., Tofte, M., Harper, R. & MacQueen, D. (1997) The Definition of Standard ML. Revised edn. The MIT Press.CrossRefGoogle Scholar
Nielsen, L. & Henglein, F. (2011) Bit-coded regular expression parsing. In International Conference on Language and Automata Theory and Applications, LATA 2011. LNCS 6638, pp. 402–413.CrossRefGoogle Scholar
Okasaki, C. (1995) Purely functional random-access lists. In Proceedings of the 7th International Conference on Functional Programming Languages and Computer Architecture, FPCA 1995. ACM, pp. 86–95.CrossRefGoogle Scholar
Owens, S., Reppy, J. & Turon, A. (2009) Regular-expression derivatives re-examined. J. Function. Program. 19(2), 173190.CrossRefGoogle Scholar
Pfenning, F. & Schürmann, C. (1999) System description: Twelf — a meta-logical framework for deductive systems. In Automated Deduction – CADE-16. LNCS 1632, pp. 202–206.CrossRefGoogle Scholar
Rabin, M. O. & Scott, D. (1959) Finite automata and their decision problems. IBM J. Res. Dev. 3(2), 114125.CrossRefGoogle Scholar
Reynolds, J. C. (1972) Definitional interpreters for higher-order programming languages. In Proceedings of 25th ACM National Conference pp. 717–740. Reprinted, with a foreword, in Higher-Order and Symbolic Computation 11(4), 363397.Google Scholar
Thompson, K. (1968) Programming techniques: Regular expression search algorithm. Commun. ACM 11(6), 419422.CrossRefGoogle Scholar
Yi, K. (2006) ‘Proof-directed debugging’ revisited for a first-order version. J. Function. Program. 16(6), 663670.CrossRefGoogle Scholar
Submit a response


No Discussions have been published for this article.