Hostname: page-component-77f85d65b8-2tv5m Total loading time: 0 Render date: 2026-04-17T17:21:58.089Z Has data issue: false hasContentIssue false

Identifying signs of syntactic complexity for rule-based sentence simplification

Published online by Cambridge University Press:  31 October 2018

RICHARD EVANS
Affiliation:
Research Institute in Information and Language Processing, University of Wolverhampton, Wolverhampton, UK e-mail: R.J.Evans@wlv.ac.uk, C.Orasan@wlv.ac.uk
CONSTANTIN ORĂSAN
Affiliation:
Research Institute in Information and Language Processing, University of Wolverhampton, Wolverhampton, UK e-mail: R.J.Evans@wlv.ac.uk, C.Orasan@wlv.ac.uk

Abstract

This article presents a new method to automatically simplify English sentences. The approach is designed to reduce the number of compound clauses and nominally bound relative clauses in input sentences. The article provides an overview of a corpus annotated with information about various explicit signs of syntactic complexity and describes the two major components of a sentence simplification method that works by exploiting information on the signs occurring in the sentences of a text. The first component is a sign tagger which automatically classifies signs in accordance with the annotation scheme used to annotate the corpus. The second component is an iterative rule-based sentence transformation tool. Exploiting the sign tagger in conjunction with other NLP components, the sentence transformation tool automatically rewrites long sentences containing compound clauses and nominally bound relative clauses as sequences of shorter single-clause sentences. Evaluation of the different components reveals acceptable performance in rewriting sentences containing compound clauses but less accuracy when rewriting sentences containing nominally bound relative clauses. A detailed error analysis revealed that the major sources of error include inaccurate sign tagging, the relatively limited coverage of the rules used to rewrite sentences, and an inability to discriminate between various subtypes of clause coordination. Despite this, the system performed well in comparison with two baselines. This finding was reinforced by automatic estimations of the readability of system output and by surveys of readers’ opinions about the accuracy, accessibility, and meaning of this output.

Information

Type
Article
Copyright
Copyright © Cambridge University Press 2018 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable