Hostname: page-component-89b8bd64d-nlwjb Total loading time: 0 Render date: 2026-05-08T13:58:05.839Z Has data issue: false hasContentIssue false

A PDTB-styled end-to-end discourse parser

Published online by Cambridge University Press:  06 November 2012

ZIHENG LIN
Affiliation:
Department of Computer Science, National University of Singapore 13 Computing Drive, Singapore 117417 e-mail: linzihen@comp.nus.edu.sg, nght@comp.nus.edu.sg, kanmy@comp.nus.edu.sg SAP Research, SAP Asia Pte Ltd, 30 Pasir Panjang Road, Singapore 117440
HWEE TOU NG
Affiliation:
Department of Computer Science, National University of Singapore 13 Computing Drive, Singapore 117417 e-mail: linzihen@comp.nus.edu.sg, nght@comp.nus.edu.sg, kanmy@comp.nus.edu.sg
MIN-YEN KAN
Affiliation:
Department of Computer Science, National University of Singapore 13 Computing Drive, Singapore 117417 e-mail: linzihen@comp.nus.edu.sg, nght@comp.nus.edu.sg, kanmy@comp.nus.edu.sg

Abstract

Since the release of the large discourse-level annotation of the Penn Discourse Treebank (PDTB), research work has been carried out on certain subtasks of this annotation, such as disambiguating discourse connectives and classifying Explicit or Implicit relations. We see a need to construct a full parser on top of these subtasks and propose a way to evaluate the parser. In this work, we have designed and developed an end-to-end discourse parser-to-parse free texts in the PDTB style in a fully data-driven approach. The parser consists of multiple components joined in a sequential pipeline architecture, which includes a connective classifier, argument labeler, explicit classifier, non-explicit classifier, and attribution span labeler. Our trained parser first identifies all discourse and non-discourse relations, locates and labels their arguments, and then classifies the sense of the relation between each pair of arguments. For the identified relations, the parser also determines the attribution spans, if any, associated with them. We introduce novel approaches to locate and label arguments, and to identify attribution spans. We also significantly improve on the current state-of-the-art connective classifier. We propose and present a comprehensive evaluation from both component-wise and error-cascading perspectives, in which we illustrate how each component performs in isolation, as well as how the pipeline performs with errors propagated forward. The parser gives an overall system F1 score of 46.80 percent for partial matching utilizing gold standard parses, and 38.18 percent with full automation.

Information

Type
Articles
Copyright
Copyright © Cambridge University Press 2012 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable