Hostname: page-component-89b8bd64d-j4x9h Total loading time: 0 Render date: 2026-05-06T11:54:15.972Z Has data issue: false hasContentIssue false

Data-driven deep-syntactic dependency parsing

Published online by Cambridge University Press:  18 August 2015

MIGUEL BALLESTEROS
Affiliation:
Pompeu Fabra University, Natural Language Processing Group, Roc Boronat 138, 08018 Barcelona, Spain e-mails: miguel.ballesteros@upf.edu, simon.mille@upf.edu, leo.wanner@upf.edu
BERND BOHNET
Affiliation:
Google Inc. London, 76 Buckingham Palace Road, London SW1W 9TQ, UK e-mail: bohnetb@gmail.com
SIMON MILLE
Affiliation:
Pompeu Fabra University, Natural Language Processing Group, Roc Boronat 138, 08018 Barcelona, Spain e-mails: miguel.ballesteros@upf.edu, simon.mille@upf.edu, leo.wanner@upf.edu
LEO WANNER
Affiliation:
Pompeu Fabra University, Natural Language Processing Group, Roc Boronat 138, 08018 Barcelona, Spain e-mails: miguel.ballesteros@upf.edu, simon.mille@upf.edu, leo.wanner@upf.edu Catalan Institute for Research and Advanced Studies (ICREA), Lluis Companys, 23, 08010 Barcelona, Spain
Rights & Permissions [Opens in a new window]

Abstract

‘Deep-syntactic’ dependency structures that capture the argumentative, attributive and coordinative relations between full words of a sentence have a great potential for a number of NLP-applications. The abstraction degree of these structures is in between the output of a syntactic dependency parser (connected trees defined over all words of a sentence and language-specific grammatical functions) and the output of a semantic parser (forests of trees defined over individual lexemes or phrasal chunks and abstract semantic role labels which capture the frame structures of predicative elements and drop all attributive and coordinative dependencies). We propose a parser that provides deep-syntactic structures. The parser has been tested on Spanish, English and Chinese.

Information

Type
Articles
Copyright
Copyright © Cambridge University Press 2015 
Figure 0

Fig. 1. SSyntSs, PropBank structure, and DRS of (1).

Figure 1

Fig. 2. DSyntSs of (1).

Figure 2

Fig. 3. SSyntS (top) and DSyntS (bottom) for the sentence The producer thinks that the new song will be successful soon.

Figure 3

Fig. 4. SSyntS and DSyntS for the sentence Almost 1.2 million jobs have been created by the state in that time.

Figure 4

Fig. 5. PropBank structure of the sentence Almost 1.2 million jobs have been created by the state in that time.

Figure 5

Fig. 6. Collapsed Stanford dependency structure of the sentence Almost 1.2 million jobs have been created by the state in that time.

Figure 6

Fig. 7. SSyntS and DSyntS of the sentence el profesor dice que se quejan mucho ‘the professor says that they complain a lot’.

Figure 7

Fig. 8. A node in Sss is a node in Sds.

Figure 8

Fig. 9. A relation in Sss corresponds to a relation in Sds.

Figure 9

Fig. 10. A fragment of the Sss tree corresponds to a single node in Sds.

Figure 10

Fig. 11. A relation with a dependent or governor node in Sss is a grammeme in Sds.

Figure 11

Fig. 12. A grammeme in Sss is a grammeme in Sds.

Figure 12

Fig. 13. A node in Sss is conflated with another node in Sds.

Figure 13

Fig. 14. A node in Sds has no correspondence in Sss.

Figure 14

Fig. 15. DSyntS tree reconstruction algorithm.

Figure 15

Fig. 16. A sentence in its surface representation that shows two paths: [dep1] + [dep2] + [dep3] for the node3 and [dep1] + [dep4] for node4. The nodes governor, node3 and node4 are kept in the deep structure. The other nodes (node1 and node2) are not included in the deep structure. The system has to decide whether node3 or node4 are attached to the governor.

Figure 16

Fig. 17. Input (left) and output (right) of DSynt arc relabeling.

Figure 17

Fig. 18. Setup of a deep-syntactic parser.

Figure 18

Fig. 19. Sample PropBank entry.

Figure 19

Table 1. Quality of the automatic annotation of the PTB with the DSyntS layer

Figure 20

Fig. 20. Sample gold-standard and predicted DSyntSs: node erroneously removed from the DSyntS.

Figure 21

Fig. 21. Sample gold-standard and predicted DSyntSs: node erroneously left in the DSyntS.

Figure 22

Table 2. Straightforward SSynt to DSyntS DepRel mappings (Spanish)

Figure 23

Table 3. Complex SSyntS to DSyntS mappings (Spanish); ‘Dep’ = ‘dependent’, ‘Gov’ = ‘governor’, ‘DepRel’ = ‘DSynt dependency relation’

Figure 24

Table 4. Performance of the SSyntS–DSyntS transducer and of the rule-based baseline over the Spanish gold-standard held-out test set

Figure 25

Table 5. Performance of the SSyntS–DSyntS transducer and of the rule-based baseline over the English gold-standard held-out test set

Figure 26

Table 6. Performance of the SSyntS–DSyntS transducer and of the rule-based baseline over the Chinese gold-standard held-out test set

Figure 27

Table 7. Performance of the SSyntS–DSyntS transducer over the Spanish development set

Figure 28

Table 8. Performance of Bohnet and Nivre's joint PoS-tagger+dependency parser trained on the Ancora-UPF treebank for Spanish, PTB treebank for English, and the CTB treebank for Chinese

Figure 29

Table 9. Performance of the rule-based baseline and the SSyntS–DSyntS transducer over the Spanish predicted held-out test set

Figure 30

Table 10. Performance of the rule-based baseline and the SSyntS–DSyntS transducer over the English predicted held-out test set

Figure 31

Table 11. Performance of the rule-based baseline and the SSyntS–DSyntS transducer over the Chinese predicted held-out test set

Figure 32

Table 12. Performance of the rule-based baseline and the SSyntS–DSyntS transducer over the English surface gold-standard held-out test set and the manually annotated DSyntS test set

Figure 33

Table 13. Performance of the rule-based baseline and the SSyntS–DSyntS transducer over the English surface predicted held-out test set and the manually annotated DSyntS test set