From UBGs to CFGs A practical corpus-driven approach

HANS-ULRICH KRIEGER

doi:10.1017/S1351324906004128

From UBGs to CFGs A practical corpus-driven approach

Published online by Cambridge University Press: 01 December 2007

HANS-ULRICH KRIEGER

Show author details

HANS-ULRICH KRIEGER*: Affiliation:
German Research Center for Artificial Intelligence (DFKI), Stuhlsatzenhausweg 3, D-66123 Saarbriicken, Germany e-mail: krieger@dfki.de

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

We present a simple and intuitive unsound corpus-driven approximation method for turning unification-based grammars, such as HPSG, CLE, or PATR-II into context-free grammars (CFGs). Our research is motivated by the idea that we can exploit (large-scale), hand-written unification grammars not only for the purpose of describing natural language and obtaining a syntactic structure (and perhaps a semantic form), but also to address several other very practical topics. Firstly, to speed up deep parsing by having a cheap recognition pre-flter (the approximated CFG). Secondly, to obtain an indirect stochastic parsing model for the unification grammar through a trained PCFG, obtained from the approximated CFG. This gives us an efficient disambiguation model for the unification-based grammar. Thirdly, to generate domain-specific subgrammars for application areas such as information extraction or question answering. And finally, to compile context-free language models which assist the acoustic model of a speech recognizer. The approximation method is unsound in that it does not generate a CFG whose language is a true superset of the language accepted by the original unification-based grammar. It is a corpus-driven method in that it relies on a corpus of parsed sentences and generates broader CFGs when given more input samples. Our open approach can be fine-tuned in different directions, allowing us to monotonically come close to the original parse trees by shifting more information into the context-free symbols. The approach has been fully implemented in JAVA.

Information

Type: Papers
Information: Natural Language Engineering , Volume 13 , Issue 4 , December 2007 , pp. 317 - 351

DOI: https://doi.org/10.1017/S1351324906004128 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2007

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Aho, A. V., Sethi, R. and Ullman, J. D. (1986) Compilers: Principles, Techniques, and Tools. Reading, MA: Addison-Wesley.Google Scholar

Alshawi, H. (ed.) (1992) The Core Language Engine. ACL-MIT Press Series in Natural. Language Processing. MIT Press.Google Scholar

Becker, M., Drozdzynski, W, Krieger, H.-U., Piskorski, J., Schafer, U. and Xu, F. (2002) SProUT-Shallow Processing with Unifbation and Typed Feature Structures. Proceedings of the International Conference on Natural Language Processing, ICON-2002.Google Scholar

Bos, J. (2002) Compilation of Unifbation Grammars with Compositional Semantics to Speech Recognition Packages. Proceedings of the 19th International Conference on Computational Linguistics, CO LING 2002, pp. 106–112.Google Scholar

Briscoe, T. and Carroll, J. (1993) Generalized Probabilistic LR Parsing of Natural Language (Corpora) with Unifbation-Based Grammars. Computational Linguistics 19 (1): 25–59.Google Scholar

Callmeier, U. (2000) PET Platform for Experimentation with Efficient HPSG Processing. Natural Language Engineering 6 (1): 99–107.Google Scholar

Cancedda, N. and Samuelsson, C. (2000) Experiments with Corpus-based LFG Specialization. Proceedings of the 6th Conference on Applied Natural Language Processing, pp. 204–209.CrossRef Google Scholar

Carpenter, B. (1992) The Logic of Typed Feature Structures. Tracts in Theoretical Computer Science. Cambridge: Cambridge University Press.CrossRef Google Scholar

Carroll, J., Briscoe, T. and Grover, C. (1991) A Development Environment for Large Natural Language Grammars. Technical Report 233, Computer Laboratory, Cambridge University, UK.Google Scholar

Carroll, J. A. (1993) Practical Unification-based Parsing of Natural Language. PhD thesis, University of Cambridge, Computer Laboratory, Cambridge.Google Scholar

Charniak, E. 1993. Statistical Language Learning. Cambridge, MA: MIT Press.Google Scholar

Copestake, A., Lascarides, A. and Flickinger, D. (2001) An Algebra for Semantic Construction in Constraint-Based Grammars. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, ACL-2001, pp. 132–139.CrossRef Google Scholar

Diagne, A. K., Kasper, W. and Krieger, H.-U. (1995) Distributed Parsing With HPSG Grammars In Proceedings of the 4th International Workshop on Parsing Technologies, IWPT'95, pp. 79–86. (Also available as DFKI Research Report RR-95–19.)Google Scholar

Dowding, J., Hockey, B. A., Gawron, J. M. and Culy, C. (2001) Practical Issues in Compiling Typed Unification Grammars for Speech Recognition. Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, ACL-2001, pp. 164–171.Google Scholar

Flanagan, D. (2002) Java in a Nutshell. Beijing: O'Reilly.Google Scholar

Gazdar, G., Klein, E., Pullum, G. and Sag, I. (1985) Generalized Phrase Structure Grammar. Cambridge, MA: Harvard University Press.Google Scholar

Goldstein, S. D. (1988) Using an Active Chart Parser to Convert Any Context Free Grammar to Backus-Naur Form. Master's thesis, Massachusetts Institute of Technology.Google Scholar

Hopcroft, J. E. and Ullman, J. D. (1979) Introduction to Automata Theory, Languages, and Computation. Reading, MA: Addis on-Wesley.Google Scholar

Hunt, A. and McGlashan, S. (2004) Speech Recognition GrammarSpecification Version 1.0. Technical report, W3C Recommendation 16 March 2004 http://www.w3.org/TR/2004/REC-speech-grammar-20040316/.Google Scholar

Kaplan, R. and Bresnan, J. (1982) Lexical-Functional Grammar: A Formal System for Grammatical Representation. In: Bresnan, J., editor, The Mental Representation of Grammatical Relations, pp. 173–281. Cambridge, Mass: MIT Press.Google Scholar

Kasper, W. and Krieger, H.-U. (1996) Modularizing Codescriptive Grammars for Efficient Parsing. Proceedings of the 16th International Conference on Computational Linguistics, COLING-96, pp. 628–633.Google Scholar

Kasper, W, Krieger, H.-U., Spilker, J. and Weber, H. (1996) From Word Hypotheses to Logical Form: An Efficient Interleaved Approach. In: D. Gibbon, editor, Natural Language Processing and Speech Technology. Results of the 3rd KONVENS Conference, pp. 77–88. Berlin:Mouton de Gruyter.Google Scholar

Kiefer, B. and Krieger, H.-U. (2000) A Context-Free Approximation of Head-Driven Phrase Structure Grammar. Proceedings of the 6th International Workshop on Parsing Technologies, IWPT2000, pp. 135–146.Google Scholar

Kiefer, B. and Krieger, H.-U. (2002) A Context-Free Approximation of Head-Driven Phrase Structure Grammar. In: Oepen, S., Flickinger, D., Tsuji, J. and Uszkoreit, H., editors, Collaborative Language Engineering. A Case Study in Efficient Grammar-based Processing, pp. 49–76. CSLI Publications.Google Scholar

Kiefer, B. and Krieger, H.-U. (2004) A Context-Free Superset Approximation of Unification-Based Grammars. In: Bunt, H., Carroll, J. and Satta, G., editors, New Developments in Parsing Technology, pp. 229–250. Kluwer Academic.Google Scholar

Kiefer, B., Krieger, H.-U., Carroll, J. and Malouf, R. (1999) A Bag of Useful Techniques for Efficient and Robust Parsing. Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, ACL-99, pp. 473–80.Google Scholar

Kiefer, B., Krieger, H.-U. and Nederhof, M.-J. (2000) Efficient and Robust Parsing of Word Hypotheses Graphs. In: Wahlster, W., editor, Verbmobil: Foundations of Speech-to-Speech Translation, pp. 280–295. Berlin: Springer.CrossRef Google Scholar

Kiefer, B., Krieger, H.-U. and Prescher, D. (2002) A Novel Disambiguation Method For Unifbation-Based Grammars Using Probabilistic Context-Free Approximations. Proceedings of the 19th International Conference on Computational Linguistics, COLING2002.Google Scholar

Krieger, H.-U. (2004) A Corpus-Driven Context-Free Approximation of Head-Driven Phrase Structure Grammar. In: Paliouras, G. and Sakakibara, Y., editors, Proceedings of the 7th International Colloquium on Grammatical Inference, ICGI-2004, pp. 199–210. No. 3264, Lecture Notes in Artificial Intelligence. Springer.Google Scholar

Krieger, H.-U., Drozdzynski, W., Piskorski, J., Schafer, U. and Xu, F. (2004) A Bag of Useful Techniques for Unifbation-Based Finite-State Transducers. Proceedings of KONVENS 2004, pp. 105–112.Google Scholar

Krieger, H.-U. and Schafer, U. (1994) 9∼2>ψ -A Type Description Language for Constraint-Based Grammars. Proceedings of the 15th International Conference on Computational Linguistics, COLING-94, pp. 893–899. (An enlarged version of this paper is available as DFKI Research Report RR-94-37).ψ+-A+Type+Description+Language+for+Constraint-Based+Grammars.+Proceedings+of+the+15th+International+Conference+on+Computational+Linguistics,+COLING-94,+pp.+893–899.+(An+enlarged+version+of+this+paper+is+available+as+DFKI+Research+Report+RR-94-37).>Google Scholar

Lari, K. and Young, S. J. (1990) The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language 4: 35–56.Google Scholar

Malouf, R., Carroll, J. and Copestake, A. (2000) Efficient feature structure operations without compilation. Natural Language Engineering 6 (1): 29–6.Google Scholar

Moore, R. C. (1999) Using Natural-Language Knowledge Sources in Speech Recognition. In: Ponting, K., editor, Computational Models of Speech Pattern Processing, Springer.Google Scholar

Nakazawa, T (1995) Construction of LR Parsing Tables for Grammars Using Feature-Based Syntactic Categories. In: Cole, J., Green, G., and Morgan, J., editors, Linguistics and Computation, pp. 199–219. CSLI Lecture Notes.Google Scholar

Nederhof, M.-J. (2000) Practical Experiments with Regular Approximation of Context-Free Languages. Computational Linguistics 26 (1): 17–44.Google Scholar

Neumann, G. (2003) Data-driven Approaches to Head-Driven Phrase Structure Grammar. In: Bod, R., Scha, R. and Simaan, K., editors, Data-Oriented Parsing, pp. University of Chicago Press.Google Scholar

Neumann, G. and Flickinger, D. (1999) Learning Stochastic Lexicalized Tree Grammars from HPSG. Technical report, German Research Center for Artifbal Intelligence (DFKI), Saarbriicken.Google Scholar

Nuance (2004) Nuance Home http://www.nuance.com.Google Scholar

Oepen, S. and Callmeier, U. (2000) Measure For Measure: Parser Cross-Fertilization. Proceedings of the 6th International Workshop on Parsing Technologies, IWPT 2000, pp. 183–194.Google Scholar

Oepen, S. and Flickinger, D. P. (1998) Towards Systematic Grammar Profiling. Test Suite Technology Ten Years After. Journal of Computer Speech and Language 12 (4): 41W36.Google Scholar

Pereira, F. C. and Schabes, Y. (1992) Inside-Outside Reestimation from Partially Bracketed Corpora. Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics, ACL-92, pp. 128–135.Google Scholar

Pereira, F. C. and Wright, R. N. (1991) Finite-State Approximation of Phrase Structure Grammars. Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, ACL-91, pp. 246–255. (An enlarged version is available in E. Roche and Y. Schabes, editors, Finite-State Devices for Natural Language Processing. Cambridge, MA: MIT Press.Google Scholar

Pollard, C. and Sag, I. A. (1987) Information-Based Syntax and Semantics. Vol. I: Fundamentals. CSLI Lecture Notes, Number 13. Stanford: Center for the Study of Language and Information.Google Scholar

Pollard, C. and Sag, I. A. (1994) Head-Driven Phrase Structure Grammar. Studies in Contemporary Linguistics. Chicago: University of Chicago Press.Google Scholar

Rayner, M., Dowding, J. and Hockey, B. A. (2001a) A Baseline Method for Compiling Typed Unification Grammars into Context Free Language Models. Proceedings of EUROSPEECH.Google Scholar

Rayner, M., Gorrell, G., Hockey, B. A., Dowding, J. and Boye, J. (2001b) Do CFG-Based Language Models Need Agreement Constraints. Proceedings of the 2nd Conference of the North American Chapter of the ACL, NAACL2001.Google Scholar

Rayner, M., Hockey, B. A., James, F., Bratt, E. O., Goldwater, S. and Gawron, J. M. (2000) Compiling Language Models from a Linguistically Motivated Unifbation Grammar. Proceedings of the 18th International Conference on Computational Linguistics, COLING 2000, pp. 670–676.Google Scholar

Shieber, S., Uszkoreit, H., Pereira, F., Robinson, J. and Tyson, M. (1983) The Formalism and Implementation of PATR-II. In: Grosz, B. J. and Stickel, M. E., editors, Research on Interactive Acquisition and Use of Knowledge, pp. 39–79. Menlo Park, CA: AI Center, SRI International, November.Google Scholar

Shieber, S. M. (1985) Using Restriction to Extend Parsing Algorithms for Complex-Feature-Based Formalisms. Proceedings of the 23rd Annual Meeting of the Association for Computational Linguistics, ACL-85, pp. 145–152.Google Scholar

Uszkoreit, H. (1986) Categorial Unifbation Grammars. Proceedings of the llth International Conference on Computational Linguistics, pp. 187–194.Google Scholar

Van Tichelen, L. (2003) Semantic Interpretation for Speech Recognition. Technical report, W3C Working Draft 1 April 2003 http://www.w3.org/TR/2003/WD-semantic-interpretation-20030401/.Google Scholar

Zeevat, H., Klein, E. and Calder, J. (1987) Unifbation Categorial Grammar. In: Haddock, N., Klein, E., and Merrill, G., editors, Edinburgh Working Papers in Cognitive Science, 1: Categorial Grammar, Unification Grammar, and Parsing, pp. 195–222. Centre for Cognitive Science, Edinburgh University, UK.Google Scholar

Article contents

From UBGs to CFGs A practical corpus-driven approach

Abstract

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests