Hostname: page-component-76fb5796d-wq484 Total loading time: 0 Render date: 2024-04-28T08:22:03.151Z Has data issue: false hasContentIssue false

PP-Attachment Disambiguation for Swedish: Combining Unsupervised and Supervised Training Data

Published online by Cambridge University Press:  22 December 2008

Dimitrios Kokkinakis
Affiliation:
Språkdata/Göteborg University, Box 200, SE-405 30, Sweden. E-mail: svedk@svenska.gu.se
Get access

Abstract

Structural ambiguity, particularly attachment of prepositional phrases, is a serious type of global ambiguity in Natural Language. The disambiguation becomes crucial when a syntactic analyzer must make the correct decision among at least two equally grammatical parse-trees for the same sentence. This paper attempts to find answers to the problem of how attachment ambiguity can be resolved by utilizing Machine Learning (ML) techniques. ML is founded on the assumption that the performance in cognitive tasks is based on the similarity of new situations (testing) to stored representations of earlier experiences (training). Therefore, a large amount of training data is an important prerequisite for providing a solution to the problem. A combination of unsupervised and restricted supervised acquisition of such data will be reported. Training is performed both on a subset of the content of the Gothenburg Lexical Database (GLDB), and on instances of large corpora annotated with coarse-grained semantic information. Testing is performed on corpora instances using a range of different algorithms and metrics. The application language is written Swedish.

Type
Research Article
Copyright
Copyright © Cambridge University Press 2000

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

REFERENCES

Abney, S. 1990. Rapid Incremental Parsing with Repair. In Proceedings of the 6th New OEDC, Waterloo, ON, 19.Google Scholar
Basili, R., Candito, M. H., Pazienza, M. T., & Velardi, P. 1997. Evaluating the Information Gain of Probability-Based PP-Disambiguation Methods. In Jones, D. & Somers, H. (eds.), New Methods in Language Processing, London: UCL Press, 241255.Google Scholar
Brill, E., & Resnik, P., 1994. A Rule-Based Approach to Prepositional Phrase Attachment Disambiguation. In Proceedings of the Computational Linguistics, COLINO '94, Paper available from: http://www.cs.jhu.edu/~brill/acadpubs.htmlCrossRefGoogle Scholar
Brill, E. 1994. Some Advances in Rule-Based Part of Speech Tagging, In Proceedings of the 12th AAAI '94, Seattle, WA.Google Scholar
Cardie, C., & Mooney, R. J. 1999. Guest Editors' Introduction: Machine Learning and Natural Language, Journal of Machine Learning, Special Issue on Natural Language Learning 34, 15.Google Scholar
Charniak, E. 1993. Statistical Language Learning, Cambridge. MA: MIT Press.Google Scholar
Cunningham, H., Gaizauskas, R., & Wilks, Y. 1995. A General Architecture for Text Engineering (GATE) – A New Approach to Language Engineering R&D, Technical report CS - 95 - 21, University of Sheffield, http://www.dcs.shef.ac.uk/research/groups/nip/gate/CrossRefGoogle Scholar
Dahlgren, K., McDowell, J., & Stabler, E. 1989. Knowledge Representation for Commonsense Reasoning with Text. Journal of Computational Linguistics 15.3, 149170.Google Scholar
Daelemans, W., Zavrel, J., Berck, P., & Gillis, S. 1996. MBT: A Memory-Based Part of Speech Tagger-Generator. In Ejerhed, E. & Dagan, I. (eds.), Proceedings of the 4th Workshop on Very Large Corpora, Copenhagen, 1427.Google Scholar
Daelemans, W., Zavrel, J., van der Sloot, K., & van den Bosch, A. 1999. TIMBL: Tilburg Memory Based Learner, version 2.0, Reference Guide, ILK Technical Report 99–01, Paper available from: http://ilk.kub.nl/~ilk/papers/ilk9901.ps.gzGoogle Scholar
Hindle, D., & Rooth, M. 1993. Structural Ambiguity and Lexical Relations. Journal of Computational Linguistics 19.1, 103120.Google Scholar
Hirst, G. 1987. Semantic Interpretation and the Resolution of Ambiguity. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Jensen, K., & Binot, J.-L. 1987. Disambiguating Prepositional Phrase Attachment by Using On-Line Dictionary Definitions. Journal of Computational Linguistics 13.3, 4, 251260.Google Scholar
Johansson-Kokkinakis, S., & Kokkinakis, D. 1996. Rule-Based Tagging in Språkbanken, Research Reports from the Department of Swedish, Göteborg University, GU-ISS-96–5.Google Scholar
Kokkinakis, D. 1998. AVENTINUS, GATE and Swedish Lingware. In Proceedings of the 11th NODALIDA Conference, Nordiska Datalingvistikdagarna, Copenhagen, 2233. Paper available from: http://www.nodali.sics.se/bibliotek/nodalida/1998_kph/Google Scholar
Kokkinakis, D., & Johansson-Kokkinakis, S. 1999a. A Cascaded Finite-State Parser for Syntactic Analysis of Swedish. In Proceedings of the 9th EACL, Bergen, 245248.Google Scholar
Kokkinakis, D., & Johansson-Kokkinakis, S. 1999b. Sense-Tagging at the Cycle-Level Using GLDB, In Proceedings of the ‘Nordisk Förening i Lexikografi’ NFL Symposium, Göteborg.Google Scholar
Kokkinakis, D., Toporowska Gronostaj, M., & Warmenius, K. 2000. Annotating, Disambiguating & Automatically Extending the Coverage of the Swedish SIMPLE Lexicon. In Proceedings of the 2nd Language Resources and Evaluation Conference, Athens, Hellas.Google Scholar
Malmgren, S. G. 1992. From Svenska ordbok (‘A dictionary of Swedish’) to National-encyklopediens–ordbok (‘The Dictionary of the National Encyclopedia’). In Tommola, H., Varantola, K., Salmi-Tolonen, T. & Schopp, J. (eds.), Proceedings of the EURALEX '92, Tampere, Finland, 2, 485491.Google Scholar
Marcus, M., Santorini, B., & Marcinkiewicz, M. 1993. Building a Large Annotated Corpus of English: the Penn Treebank. Journal of Computational Linguistics 19.2,Google Scholar
Merlo, P., Crocker, K., & Berthouzoz, S. 1997. Attaching Multiple Prepositional Phrases: Generalized Backed-off Estimation. In Cardie C. & Weischedel R. (eds.), Proceedings of the 2nd Conference on EMNLP, Rhode Island, 149155.Google Scholar
Miller, G. A. (ed.) 1990. WordNet: An on-line Lexical Database. In International Journal of Lexicographe, 3.4, Special Issue.CrossRefGoogle Scholar
Mitchell, T. M. 1997. Machine Learning. McGraw-Hill Series on Computer Science.Google Scholar
NEO 1996. Natonalencyklopedinsordbok. Volumes 1–3, Språkdata & Bra Böcker AB.Google Scholar
Ratnaparkhi, A., Reynar, J., & Roukos, S. 1994. A Maximun Entropy Model for Prepositional Phrase Attachment. In Proceedings of the ARPA Human Language Technology Workshop, 250–255, Paper available from: http://www.cis.upenn.edu/~adwait/statnlp.htmlCrossRefGoogle Scholar
Roth, D. 1998. Learning to Resolve Language Ambiguities: A Unified Approach. In Proceedings of the AAAl-98. American Association of Artificial Intelligence, Madison, WI, USA.Google Scholar
Sinclair, J. M. 1992. The Automatic Analysis of Corpora. In Proceedings of the Nobel Symposium '82, Stockholm, Mouton de Groyter.Google Scholar
Stetina, J., & Nagao, M. 1997. Corpus Based PP Attachment Ambiguity Resolution with a Semantic Dictionary. In Proceedings of the 5th Workshop on Very Large Corpora, China & Hong Kong, 6680.Google Scholar
Sopena, J. M., Lloberas, A., & Moliner, J. L. 1998. A Connectionist Approach to Prepositional Phrase Attachment for Real World Texts. In Proceedings of the 17th COLING-36th ACL, Montreal, Canada, 2, 12531257.Google Scholar
Whittemore, G., Ferrara, K., & Brunner, H. 1990. Empirical Study of Predicative Powers of Simple Attachment Schemes for Post-Modifier Prepositional Phrases. In Proceedings of the 28th ACL, Pittsburgh, PA, 2530.Google Scholar
Yarowsky, D. 1994. A Comparison of Corpus-based techniques for Restoring Accents in Spanish and French Text. In Proceedings of the 2nd Workshop on Very Large Corpora, Kyoto, Japan, 1932.Google Scholar
Zavrel, I., Daelemans, W., & Veenstra, J. 1997. Resolving PP attachment Ambiguities with Memory-Based Learning. In Elison M. (ed.), Proceedings of the Computational Natural Language Learning Conference, Madrid, 136144.Google Scholar