Hostname: page-component-8448b6f56d-42gr6 Total loading time: 0 Render date: 2024-04-20T01:51:41.127Z Has data issue: false hasContentIssue false

From UBGs to CFGs A practical corpus-driven approach

Published online by Cambridge University Press:  01 December 2007

HANS-ULRICH KRIEGER*
Affiliation:
German Research Center for Artificial Intelligence (DFKI), Stuhlsatzenhausweg 3, D-66123 Saarbriicken, Germany e-mail: krieger@dfki.de

Abstract

We present a simple and intuitive unsound corpus-driven approximation method for turning unification-based grammars, such as HPSG, CLE, or PATR-II into context-free grammars (CFGs). Our research is motivated by the idea that we can exploit (large-scale), hand-written unification grammars not only for the purpose of describing natural language and obtaining a syntactic structure (and perhaps a semantic form), but also to address several other very practical topics. Firstly, to speed up deep parsing by having a cheap recognition pre-flter (the approximated CFG). Secondly, to obtain an indirect stochastic parsing model for the unification grammar through a trained PCFG, obtained from the approximated CFG. This gives us an efficient disambiguation model for the unification-based grammar. Thirdly, to generate domain-specific subgrammars for application areas such as information extraction or question answering. And finally, to compile context-free language models which assist the acoustic model of a speech recognizer. The approximation method is unsound in that it does not generate a CFG whose language is a true superset of the language accepted by the original unification-based grammar. It is a corpus-driven method in that it relies on a corpus of parsed sentences and generates broader CFGs when given more input samples. Our open approach can be fine-tuned in different directions, allowing us to monotonically come close to the original parse trees by shifting more information into the context-free symbols. The approach has been fully implemented in JAVA.

Type
Papers
Copyright
Copyright © Cambridge University Press 2007

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Aho, A. V., Sethi, R. and Ullman, J. D. (1986) Compilers: Principles, Techniques, and Tools. Reading, MA: Addison-Wesley.Google Scholar
Alshawi, H. (ed.) (1992) The Core Language Engine. ACL-MIT Press Series in Natural. Language Processing. MIT Press.Google Scholar
Becker, M., Drozdzynski, W, Krieger, H.-U., Piskorski, J., Schafer, U. and Xu, F. (2002) SProUT-Shallow Processing with Unifbation and Typed Feature Structures. Proceedings of the International Conference on Natural Language Processing, ICON-2002.Google Scholar
Bos, J. (2002) Compilation of Unifbation Grammars with Compositional Semantics to Speech Recognition Packages. Proceedings of the 19th International Conference on Computational Linguistics, CO LING 2002, pp. 106–112.Google Scholar
Briscoe, T. and Carroll, J. (1993) Generalized Probabilistic LR Parsing of Natural Language (Corpora) with Unifbation-Based Grammars. Computational Linguistics 19 (1): 2559.Google Scholar
Callmeier, U. (2000) PET Platform for Experimentation with Efficient HPSG Processing. Natural Language Engineering 6 (1): 99107.Google Scholar
Cancedda, N. and Samuelsson, C. (2000) Experiments with Corpus-based LFG Specialization. Proceedings of the 6th Conference on Applied Natural Language Processing, pp. 204–209.CrossRefGoogle Scholar
Carpenter, B. (1992) The Logic of Typed Feature Structures. Tracts in Theoretical Computer Science. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Carroll, J., Briscoe, T. and Grover, C. (1991) A Development Environment for Large Natural Language Grammars. Technical Report 233, Computer Laboratory, Cambridge University, UK.Google Scholar
Carroll, J. A. (1993) Practical Unification-based Parsing of Natural Language. PhD thesis, University of Cambridge, Computer Laboratory, Cambridge.Google Scholar
Charniak, E. 1993. Statistical Language Learning. Cambridge, MA: MIT Press.Google Scholar
Copestake, A., Lascarides, A. and Flickinger, D. (2001) An Algebra for Semantic Construction in Constraint-Based Grammars. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, ACL-2001, pp. 132–139.CrossRefGoogle Scholar
Diagne, A. K., Kasper, W. and Krieger, H.-U. (1995) Distributed Parsing With HPSG Grammars In Proceedings of the 4th International Workshop on Parsing Technologies, IWPT'95, pp. 79–86. (Also available as DFKI Research Report RR-95–19.)Google Scholar
Dowding, J., Hockey, B. A., Gawron, J. M. and Culy, C. (2001) Practical Issues in Compiling Typed Unification Grammars for Speech Recognition. Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, ACL-2001, pp. 164–171.Google Scholar
Flanagan, D. (2002) Java in a Nutshell. Beijing: O'Reilly.Google Scholar
Gazdar, G., Klein, E., Pullum, G. and Sag, I. (1985) Generalized Phrase Structure Grammar. Cambridge, MA: Harvard University Press.Google Scholar
Goldstein, S. D. (1988) Using an Active Chart Parser to Convert Any Context Free Grammar to Backus-Naur Form. Master's thesis, Massachusetts Institute of Technology.Google Scholar
Hopcroft, J. E. and Ullman, J. D. (1979) Introduction to Automata Theory, Languages, and Computation. Reading, MA: Addis on-Wesley.Google Scholar
Hunt, A. and McGlashan, S. (2004) Speech Recognition GrammarSpecification Version 1.0. Technical report, W3C Recommendation 16 March 2004 http://www.w3.org/TR/2004/REC-speech-grammar-20040316/.Google Scholar
Kaplan, R. and Bresnan, J. (1982) Lexical-Functional Grammar: A Formal System for Grammatical Representation. In: Bresnan, J., editor, The Mental Representation of Grammatical Relations, pp. 173281. Cambridge, Mass: MIT Press.Google Scholar
Kasper, W. and Krieger, H.-U. (1996) Modularizing Codescriptive Grammars for Efficient Parsing. Proceedings of the 16th International Conference on Computational Linguistics, COLING-96, pp. 628–633.Google Scholar
Kasper, W, Krieger, H.-U., Spilker, J. and Weber, H. (1996) From Word Hypotheses to Logical Form: An Efficient Interleaved Approach. In: D. Gibbon, editor, Natural Language Processing and Speech Technology. Results of the 3rd KONVENS Conference, pp. 7788. Berlin:Mouton de Gruyter.Google Scholar
Kiefer, B. and Krieger, H.-U. (2000) A Context-Free Approximation of Head-Driven Phrase Structure Grammar. Proceedings of the 6th International Workshop on Parsing Technologies, IWPT2000, pp. 135–146.Google Scholar
Kiefer, B. and Krieger, H.-U. (2002) A Context-Free Approximation of Head-Driven Phrase Structure Grammar. In: Oepen, S., Flickinger, D., Tsuji, J. and Uszkoreit, H., editors, Collaborative Language Engineering. A Case Study in Efficient Grammar-based Processing, pp. 49–76. CSLI Publications.Google Scholar
Kiefer, B. and Krieger, H.-U. (2004) A Context-Free Superset Approximation of Unification-Based Grammars. In: Bunt, H., Carroll, J. and Satta, G., editors, New Developments in Parsing Technology, pp. 229250. Kluwer Academic.Google Scholar
Kiefer, B., Krieger, H.-U., Carroll, J. and Malouf, R. (1999) A Bag of Useful Techniques for Efficient and Robust Parsing. Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, ACL-99, pp. 473–80.Google Scholar
Kiefer, B., Krieger, H.-U. and Nederhof, M.-J. (2000) Efficient and Robust Parsing of Word Hypotheses Graphs. In: Wahlster, W., editor, Verbmobil: Foundations of Speech-to-Speech Translation, pp. 280295. Berlin: Springer.CrossRefGoogle Scholar
Kiefer, B., Krieger, H.-U. and Prescher, D. (2002) A Novel Disambiguation Method For Unifbation-Based Grammars Using Probabilistic Context-Free Approximations. Proceedings of the 19th International Conference on Computational Linguistics, COLING2002.Google Scholar
Krieger, H.-U. (2004) A Corpus-Driven Context-Free Approximation of Head-Driven Phrase Structure Grammar. In: Paliouras, G. and Sakakibara, Y., editors, Proceedings of the 7th International Colloquium on Grammatical Inference, ICGI-2004, pp. 199210. No. 3264, Lecture Notes in Artificial Intelligence. Springer.Google Scholar
Krieger, H.-U., Drozdzynski, W., Piskorski, J., Schafer, U. and Xu, F. (2004) A Bag of Useful Techniques for Unifbation-Based Finite-State Transducers. Proceedings of KONVENS 2004, pp. 105–112.Google Scholar
Krieger, H.-U. and Schafer, U. (1994) 9∼2>ψ -A Type Description Language for Constraint-Based Grammars. Proceedings of the 15th International Conference on Computational Linguistics, COLING-94, pp. 893–899. (An enlarged version of this paper is available as DFKI Research Report RR-94-37).ψ+-A+Type+Description+Language+for+Constraint-Based+Grammars.+Proceedings+of+the+15th+International+Conference+on+Computational+Linguistics,+COLING-94,+pp.+893–899.+(An+enlarged+version+of+this+paper+is+available+as+DFKI+Research+Report+RR-94-37).>Google Scholar
Lari, K. and Young, S. J. (1990) The estimation of stochastic context-free grammars using the inside-outside algorithm. Computer Speech and Language 4: 3556.Google Scholar
Malouf, R., Carroll, J. and Copestake, A. (2000) Efficient feature structure operations without compilation. Natural Language Engineering 6 (1): 29–6.Google Scholar
Moore, R. C. (1999) Using Natural-Language Knowledge Sources in Speech Recognition. In: Ponting, K., editor, Computational Models of Speech Pattern Processing, Springer.Google Scholar
Nakazawa, T (1995) Construction of LR Parsing Tables for Grammars Using Feature-Based Syntactic Categories. In: Cole, J., Green, G., and Morgan, J., editors, Linguistics and Computation, pp. 199–219. CSLI Lecture Notes.Google Scholar
Nederhof, M.-J. (2000) Practical Experiments with Regular Approximation of Context-Free Languages. Computational Linguistics 26 (1): 1744.Google Scholar
Neumann, G. (2003) Data-driven Approaches to Head-Driven Phrase Structure Grammar. In: Bod, R., Scha, R. and Simaan, K., editors, Data-Oriented Parsing, pp. University of Chicago Press.Google Scholar
Neumann, G. and Flickinger, D. (1999) Learning Stochastic Lexicalized Tree Grammars from HPSG. Technical report, German Research Center for Artifbal Intelligence (DFKI), Saarbriicken.Google Scholar
Nuance (2004) Nuance Home http://www.nuance.com.Google Scholar
Oepen, S. and Callmeier, U. (2000) Measure For Measure: Parser Cross-Fertilization. Proceedings of the 6th International Workshop on Parsing Technologies, IWPT 2000, pp. 183–194.Google Scholar
Oepen, S. and Flickinger, D. P. (1998) Towards Systematic Grammar Profiling. Test Suite Technology Ten Years After. Journal of Computer Speech and Language 12 (4): 41W36.Google Scholar
Pereira, F. C. and Schabes, Y. (1992) Inside-Outside Reestimation from Partially Bracketed Corpora. Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics, ACL-92, pp. 128–135.Google Scholar
Pereira, F. C. and Wright, R. N. (1991) Finite-State Approximation of Phrase Structure Grammars. Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, ACL-91, pp. 246–255. (An enlarged version is available in E. Roche and Y. Schabes, editors, Finite-State Devices for Natural Language Processing. Cambridge, MA: MIT Press.Google Scholar
Pollard, C. and Sag, I. A. (1987) Information-Based Syntax and Semantics. Vol. I: Fundamentals. CSLI Lecture Notes, Number 13. Stanford: Center for the Study of Language and Information.Google Scholar
Pollard, C. and Sag, I. A. (1994) Head-Driven Phrase Structure Grammar. Studies in Contemporary Linguistics. Chicago: University of Chicago Press.Google Scholar
Rayner, M., Dowding, J. and Hockey, B. A. (2001a) A Baseline Method for Compiling Typed Unification Grammars into Context Free Language Models. Proceedings of EUROSPEECH.Google Scholar
Rayner, M., Gorrell, G., Hockey, B. A., Dowding, J. and Boye, J. (2001b) Do CFG-Based Language Models Need Agreement Constraints. Proceedings of the 2nd Conference of the North American Chapter of the ACL, NAACL2001.Google Scholar
Rayner, M., Hockey, B. A., James, F., Bratt, E. O., Goldwater, S. and Gawron, J. M. (2000) Compiling Language Models from a Linguistically Motivated Unifbation Grammar. Proceedings of the 18th International Conference on Computational Linguistics, COLING 2000, pp. 670–676.Google Scholar
Shieber, S., Uszkoreit, H., Pereira, F., Robinson, J. and Tyson, M. (1983) The Formalism and Implementation of PATR-II. In: Grosz, B. J. and Stickel, M. E., editors, Research on Interactive Acquisition and Use of Knowledge, pp. 3979. Menlo Park, CA: AI Center, SRI International, November.Google Scholar
Shieber, S. M. (1985) Using Restriction to Extend Parsing Algorithms for Complex-Feature-Based Formalisms. Proceedings of the 23rd Annual Meeting of the Association for Computational Linguistics, ACL-85, pp. 145–152.Google Scholar
Uszkoreit, H. (1986) Categorial Unifbation Grammars. Proceedings of the llth International Conference on Computational Linguistics, pp. 187–194.Google Scholar
Van Tichelen, L. (2003) Semantic Interpretation for Speech Recognition. Technical report, W3C Working Draft 1 April 2003 http://www.w3.org/TR/2003/WD-semantic-interpretation-20030401/.Google Scholar
Zeevat, H., Klein, E. and Calder, J. (1987) Unifbation Categorial Grammar. In: Haddock, N., Klein, E., and Merrill, G., editors, Edinburgh Working Papers in Cognitive Science, 1: Categorial Grammar, Unification Grammar, and Parsing, pp. 195–222. Centre for Cognitive Science, Edinburgh University, UK.Google Scholar