Regular-expression derivatives re-examined

SCOTT OWENS; JOHN REPPY; AARON TURON

doi:10.1017/S0956796808007090

Regular-expression derivatives re-examined

Part of: JFP Research Articles

Published online by Cambridge University Press: 01 March 2009

SCOTT OWENS ,

JOHN REPPY and

AARON TURON

Show author details

SCOTT OWENS: Affiliation:
University of Cambridge (e-mail: Scott.Owens@cl.cam.ac.uk)
JOHN REPPY: Affiliation:
University of Chicago (e-mail: jhr@cs.uchicago.edu)
AARON TURON: Affiliation:
University of Chicago, Northeastern University (e-mail: turon@ccs.neu.edu)

Article contents

Abstract
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Regular-expression derivatives are an old, but elegant, technique for compiling regular expressions to deterministic finite-state machines. It easily supports extending the regular-expression operators with boolean operations, such as intersection and complement. Unfortunately, this technique has been lost in the sands of time and few computer scientists are aware of it. In this paper, we reexamine regular-expression derivatives and report on our experiences in the context of two different functional-language implementations. The basic implementation is simple and we show how to extend it to handle large character sets (e.g., Unicode). We also show that the derivatives approach leads to smaller state machines than the traditional algorithm given by McNaughton and Yamada.

Information

Type: Articles
Information: Journal of Functional Programming , Volume 19 , Issue 2 , March 2009 , pp. 173 - 190

DOI: https://doi.org/10.1017/S0956796808007090 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2009

References

Aho, A. V., Hopcroft, J. E. & Ullman, J. D. (1974) The Design and Analysis of Computer Algorithms. Reading, MA: Addison Wesley.Google Scholar

Aho, A. V., Sethi, R. & Ullman, J. D. (1986) Compilers: Principles, Techniques, and Tools. Reading, MA: Addison Wesley.Google Scholar

Aho, A. V. & Ullman, J. D. (1972) The Theory of Parsing, Translation, and Compiling. Vol. 1. Englewood Cliffs, NJ: Prentice Hall.Google Scholar

Appel, A. W. (1998) Modern Compiler Implementation in ML. Cambridge: Cambridge University Press.Google Scholar

Appel, A. W., Mattson, J. S. & Tarditi, D. R. (1994Oct.) A Lexical Analyzer Generator for Standard ML. Available at: http://smlnj.org/doc/ML-Lex/manual.html.Google Scholar

Baxter, I., Pidgeon, C., & Mehlich, M. (2004) DMS: Program transformations for practical scalable software evolution. In International Conference on Software Engineering.Google Scholar

Berry, G. (1999) The Esterel v5 Language Primer Version 5.21 Release 2.0. Available at: ftp://ftp-sop.inria.fr/meije/esterel/papers/primer.pdf.Google Scholar

Berry, G., & Sethi, R. (1986) From regular expressions to deterministic automata. Theoret. Comp. Sci. Dec., 48 (1)117–126.CrossRef Google Scholar

Brzozowski, J. A. (1964) Derivatives of regular expressions. J. ACM 11 (4), 481–494.CrossRef Google Scholar

English, J. (1999) How to Validate XML. Available at: http://www.flightlab.com/~joe/sgml/validate.html. (Accessed 24 November 2008).Google Scholar

Findler, R. B., Clements, J., Flanagan, C., Flatt, M., Krishnamurthi, S., Steckler, P., & Felleisen, M. (2002) DrScheme: A programming environment for Scheme. J. Funct. Prog. 12 (2), 159–182.CrossRef Google Scholar

Fisher, C. N., & LeBlanc, R. J. Jr., (1988) Crafting a Compiler. Menlo Park, CA: Benjamin/Cummings.Google Scholar

McNaughton, R., & Yamada, H. (1960) Regular expressions and state graphs for automata. IEEE Trans. Elec. Comp. 9, 39–47.CrossRef Google Scholar

Rabin, M. O., & Scott, D. (1959) Finite automata and their decision problems. IBM J. Res. Dev. 3 (2), 114–125.CrossRef Google Scholar

Schmidt, Martin. (2002) Design and Implementation of a Validating XML Parser in Haskell. Master's thesis, Computer Science Department, University of Applied Sciences Wedel.Google Scholar

Sen, K., & Roşu, G. (2003) Generating optimal monitors for extended regular expressions. In Proceedings of Runtime Verification (RV'03). Boulder, Colorado. Electronic Notes in Theoretical Computer Science, vol. 89, no. 2, pp. 226–245. Elsevier Science.Google Scholar

Thompson, K. (1968) Regular expression search algorithm. Comm. ACM 11 (6), 419–422.CrossRef Google Scholar

Unicode Consortium. (2003) The Unicode Standard, Version 4. Reading, MA: Addison-Wesley Professional.Google Scholar

Submit a response

Discussions

No Discussions have been published for this article.

Article contents

Regular-expression derivatives re-examined

Abstract

Information

References

Discussions

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests