Hostname: page-component-77c78cf97d-kmjgn Total loading time: 0.001 Render date: 2026-04-24T07:36:44.407Z Has data issue: false hasContentIssue false

A statistical model for grammar mapping

Published online by Cambridge University Press:  20 February 2015

A. BASIRAT
Affiliation:
School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran email: ali.basirat@lingfil.uu.se Department of Linguistics and Philology, Uppsala University, Uppsala, Sweden email: joakim.nivre@lingfil.uu.se
H. FAILI
Affiliation:
School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran email: ali.basirat@lingfil.uu.se School of Computer Science, Institute for Research in Fundamental Sciences (IPM), P. O. Box 19395-5746, Tehran, Iran email: h.faili@ut.ac.ir
J. NIVRE
Affiliation:
Department of Linguistics and Philology, Uppsala University, Uppsala, Sweden email: joakim.nivre@lingfil.uu.se

Abstract

The two main classes of grammars are (a) hand-crafted grammars, which are developed by language experts, and (b) data-driven grammars, which are extracted from annotated corpora. This paper introduces a statistical method for mapping the elementary structures of a data-driven grammar onto the elementary structures of a hand-crafted grammar in order to combine their advantages. The idea is employed in the context of Lexicalized Tree-Adjoining Grammars (LTAG) and tested on two LTAGs of English: the hand-crafted LTAG developed in the XTAG project, and the data-driven LTAG, which is automatically extracted from the Penn Treebank and used by the MICA parser. We propose a statistical model for mapping any elementary tree sequence of the MICA grammar onto a proper elementary tree sequence of the XTAG grammar. The model has been tested on three subsets of the WSJ corpus that have average lengths of 10, 16, and 18 words, respectively. The experimental results show that full-parse trees with average F1-scores of 72.49, 64.80, and 62.30 points could be built from 94.97%, 96.01%, and 90.25% of the XTAG elementary tree sequences assigned to the subsets, respectively. Moreover, by reducing the amount of syntactic lexical ambiguity of sentences, the proposed model significantly improves the efficiency of parsing in the XTAG system.

Information

Type
Articles
Copyright
Copyright © Cambridge University Press 2015 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable