Skip to main content
    • Aa
    • Aa

Discourse structure and language technology

  • B. WEBBER (a1), M. EGG (a2) and V. KORDONI (a3)

An increasing number of researchers and practitioners in Natural Language Engineering face the prospect of having to work with entire texts, rather than individual sentences. While it is clear that text must have useful structure, its nature may be less clear, making it more difficult to exploit in applications. This survey of work on discourse structure thus provides a primer on the bases of which discourse is structured along with some of their formal properties. It then lays out the current state-of-the-art with respect to algorithms for recognizing these different structures, and how these algorithms are currently being used in Language Technology applications. After identifying resources that should prove useful in improving algorithm performance across a range of languages, we conclude by speculating on future discourse structure-enabled technology.

Linked references
Hide All

This list contains references from the content that can be linked to their source. For a full set of references and notes please see the PDF or HTML where available.

N. Asher 1993. Reference to Abstract Objects in Discourse. Boston MA: Kluwer.

L. Carlson , D. Marcu , and M. E. Okurowski 2003. Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory. In J. van Kuppevelt and R. Smith (eds.), Current Directions in Discourse and Dialogue, pp. 85112. New York: Kluwer.

D. Hardt , and J. Elming 2010. Incremental re-training for post-editing SMT. In Proceedings of AMTA, Denver, CO, USA.

M. Hearst 1994. Multi-paragraph segmentation of expository text. In Proceedings, 32nd Annual Meeting of the Association for Computational Linguistics, Plainsboro, NJ, USA, pp. 916.

J.-D. Kim , T. Ohta , Y. Tateisi , and J. Tsujii 2003. GENIA corpus – semantically annotated corpus for bio-textmining. Bioinformatics 19 (Suppl 1): i180–2.

A. Knott 2001. Semantic and pragmatic relations and their intended effects. In T. Sanders , J. Schilperoord , and W. Spooren (eds.), Text Representation: Linguistic and Psycholinguistic Aspects, pp. 127–51. Amsterdam: Benjamins.

A. Knott , J. Oberlander , M. O'Donnell , and C. Mellish 2001. Beyond elaboration: the interaction of relations and focus in coherent text. In T. Sanders , J. Schilperoord , and W. Spooren (eds.), Text Representation: Linguistic and Psycholinguistic Aspects, pp. 181–96. Amsterdam: Benjamins.

M. Maamouri , and A. Bies 2004. Developing an Arabic treebank: methods, guidelines, procedures, and tools. In Proceedings of the Workshop on Computational Approaches to Arabic Script-Based Languages, pp. 29. Stroudsburg, PA: ACL.

I. Mani 2001. Automatic Summarization. Amsterdam, Netherlands: Benjamins.

K. McKeown 1985. Text Generation: Using Discourse Strategies and Focus Constraints to Generate Natural Language Texts. Cambridge, UK: Cambridge University Press.

K. Ono , K. Sumita , and S. Miike 1994. Abstract generation based on rhetorical structure extraction. In Proceedings, International Conference on Computational Linguistics (COLING), Kyoto, Japan, pp. 344–48.

R. Pasch , U. Brausse , E. Breindl , and U. Wassner 2003. Handbuch der Deutschen Konnektoren. Berlin, Germany: Walter de Gruyter.

J. Pustejovsky , A. Meyers , M. Palmer , and M. Poesio 2005. Merging PropBank, NomBank, TimeBank, Penn Discourse Treebank and Coreference. In CorpusAnno '05: Proceedings of the Workshop on Frontiers in Corpus Annotations II, pp. 512. Stroudsburg, PA: Association for Computational Linguistics.

D. Rumelhart 1975. Notes on a schema for stories. In D. Bobrow and A. Collins (eds.), Representation and Understanding: Studies in Cognitive Science, pp. 211–36. New York: Academic Press.

M. Stede 2008b. RST revisited: disentangling nuclearity. In C. Fabricius-Hansen and W. Ramm (eds.), Subordination versus Coordination in Sentence and Text, pp. 3358. Amsterdam, Netherlands: John Benjamins.

R. Subba , and B. D. Eugenio 2009. An effective discourse parser that uses rich linguistic information. In Proceedings of NAACL '09, pp. 566–74. Stroudsburg, PA: Association for Computational Linguistics.

E. Sweetser 1990. From Etymology to Pragmatics. Metaphorical and Cultural Aspects of Semantic Structure. Cambridge, UK: Cambridge University Press.

M. Toolan 2006. Narrative: linguistic and structural theories. In K. Brown (ed.), Encyclopedia of Language and Linguistics, 2nd ed., pp. 459–73. Amsterdam, Netherlands: Elsevier.

Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Full text views

Total number of HTML views: 6
Total number of PDF views: 35 *
Loading metrics...

Abstract views

Total abstract views: 203 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 28th March 2017. This data will be updated every 24 hours.