Skip to main content
×
×
Home

Identifying signs of syntactic complexity for rule-based sentence simplification

  • RICHARD EVANS (a1) and CONSTANTIN ORĂSAN (a1)
Abstract

This article presents a new method to automatically simplify English sentences. The approach is designed to reduce the number of compound clauses and nominally bound relative clauses in input sentences. The article provides an overview of a corpus annotated with information about various explicit signs of syntactic complexity and describes the two major components of a sentence simplification method that works by exploiting information on the signs occurring in the sentences of a text. The first component is a sign tagger which automatically classifies signs in accordance with the annotation scheme used to annotate the corpus. The second component is an iterative rule-based sentence transformation tool. Exploiting the sign tagger in conjunction with other NLP components, the sentence transformation tool automatically rewrites long sentences containing compound clauses and nominally bound relative clauses as sequences of shorter single-clause sentences. Evaluation of the different components reveals acceptable performance in rewriting sentences containing compound clauses but less accuracy when rewriting sentences containing nominally bound relative clauses. A detailed error analysis revealed that the major sources of error include inaccurate sign tagging, the relatively limited coverage of the rules used to rewrite sentences, and an inability to discriminate between various subtypes of clause coordination. Despite this, the system performed well in comparison with two baselines. This finding was reinforced by automatic estimations of the readability of system output and by surveys of readers’ opinions about the accuracy, accessibility, and meaning of this output.

Copyright
Footnotes
Hide All

*This work was supported by the European Commission under the Seventh (FP7-2007–2013) Framework Programme for Research and Technological Development [287607]. We gratefully acknowledge Emma Franklin, Zoë Harrison, and Laura Hasler for their contribution to the development of the datasets used in our research and Iustin Dornescu for his contribution to the development of the sign tagger. For their participation in the user surveys, we thank Martina Cotella, Francesca Della Moretta, Arianna Fabbri, and Victoria Yaneva. We gratefully acknowledge Larissa Sayuri Futino Castro dos Santos for assistance in collating our survey data.

Footnotes
References
Hide All
Agarwal, R., and Boggess, L., 1992. A simple but useful approach to conjunct identification. In Proceedings of the 30th Annual Meeting for Computational Linguistics, Newark, Delaware: Association for Computational Linguistics, pp. 1521.
Aluisio, S. M., Specia, L., Pardo, T. A. S., Maziero, E. G., and Fortes, R. P. M., 2008a. Towards Brazilian Portuguese automatic text simplification systems. In Proceedings of the 8th ACM Symposium on Document Engineering (DocEng ’08), Sao Paulo, Brazil: ACM, pp. 240–8.
Aluisio, S. M., Specia, L., Pardo, T. A. S., Maziero, E. G., Caseli, H. M., and Fortes, R. P. M., 2008b. A corpus analysis of simple account texts and the proposal of simplification strategies: first steps towards text simplification systems. In Proceedings of the 26th Annual ACM International Conference on Design of Communication (SIGDOC ’08), Lisbon, Portugal: ACM, pp. 1522.
Angrosh, M. A., and Siddharthan, A., 2014. Text simplification using synchronous dependency grammars: generalising automatically harvested rules. In Proceedings of the 8th International Natural Language Generation Conference, Philadelphia, Pennsylvania: Association for Computational Linguistics, pp. 1625.
Angrosh, M., Nomoto, T., and Siddharthan, A., 2014. Lexico-syntactic text simplification and compression with typed dependencies. In Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers (COLING 2014), Dublin, Ireland, pp. 19962006.
Bennetto, L., Pennington, B. F., and Rogers, S. J., 1996. Intact and impaired memory functions in autism. Child Development 67 (4): 1816–35.
Bos, J., 2008. Wide-coverage semantic analysis with boxer. In Proceedings of the 2008 Conference in Semantics in Text Processing, Venice, Italy, pp. 277–86.
Bott, S., Saggion, H., and Figueroa, D., 2012. A hybrid system for Spanish text simplification. In Proceedings of the NAACL-HLT 2012 Workshop on Speech and Language Processing for Assistive Technologies (SLPAT), Montréal, Canada, pp. 7584.
Brill, E., 1994. Some advances in transformation-based part of speech tagging. In Proceedings of the 12th National Conference on Artificial Intelligence, Seattle, Washington, pp. 722–7.
Brouwers, L., Bernhard, D., Ligozat, A.-L., and Francois, T., 2014. Syntactic sentence simplification for French. In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR) at EACL 2014, Gothenburg, Sweden: Association for Computational Linguistics, pp. 4756.
Brown, C., Snodgrass, T., Kemper, S. J., Herman, R., and Covington, M. A., 2008. Automatic measurement of propositional idea density from part-of-speech tagging. Behavior Research Methods 40 (2): 540–5.
Canning, Y. 2002. Syntactic Simplification of Text. Ph.d. thesis, University of Sunderland.
Caplan, D., and Waters, G. S., 1999. Verbal working memory and sentence comprehension. Behavioural and Brain Sciences 22 (1): 77126.
Chandrasekar, R., Doran, C., and Srinivas, B., 1996. Motivations and methods for text simplification. In Proceedings of the 16th International Conference on Computational Linguistics (COLING ’96), Copenhagen, Denmark, pp. 1041–4.
Chomsky, N. 1970. Remarks on nominalization. In Jacobs, R., and Rosenbaum, P. (eds.), Readings in English Transformational Grammar, pp. 184221. Boston, Massachusetts: Ginn and Company.
Cohen, J., 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20 (1): 3746.
Cohn, T., and Lapata, M., 2009. Sentence compression as tree transduction. Journal of Artificial Intelligence Research 20 (34): 637–74.
Coster, W., and Kauchak, D., 2011. Simple English Wikipedia: a new text simplification task. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL-2011), Portland, Oregon: Association of Computational Linguistics, pp. 665–9.
Daelemans, W., Höthker, A., and Tjong Kim Sang, E., 2004. Automatic sentence simplification for subtitling in Dutch and English. In Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC-2004), Lisbon, Portugal, pp. 1045–8.
De Belder, J., and Moens, M. F., 2010. Text simplification for children. In Proceedings of the SIGIR Workshop on Accessible Search Systems, Geneva, Switzerland, pp. 1926.
DeFrancesco, C., and Perkins, K. 2012. An analysis of the proposition density, sentence and clause types, and nonfinite verbal usage in two college textbooks. In Plakhotnik, M. S., Nielsen, S. M., and Pane, D. M. (eds.), Proceedings of the 11th Annual College of Education & GSN Research Conference, pp. 20–5. Miami, Florida: Florida International University.
de Marneffe, M.-C., MacCartney, W., and Manning, C. D., 2006. Generating typed dependency parses from phrase structure parses. In Proceedings of the International Conference on Language Resources and Evaluation (LREC), Genoa, Italy: ELDA, pp. 449–54.
Dornescu, I., Evans, R., and Orasan, C., 2013. A tagging approach to identify complex constituents for text simplification. In Proceedings of the 9th International Conference on Recent Advances in Natural Language Processing (RANLP-2013), Hissar, Bulgaria, pp. 221–9.
Evans, R. 2011. Comparing methods for the syntactic simplification of sentences in information extraction. Literary and Linguistic Computing 26 (4), 371–88.
Evans, R., and Orasan, C. 2013. Annotating signs of syntactic complexity to support sentence simplification. In Habernal, I. and Matousek, V. (eds.), Text, Speech and Dialogue. Proceedings of the 16th International Conference TSD 2013, pp. 92104. Plzen, Czech Republic: Springer.
Feblowitz, D., and Kauchak, D., 2013. Sentence simplification as tree transduction. In Proceedings of the 2nd Workshop on Predicting and Improving Text Readability for Target Reader Populations, Sofia, Bulgaria: Association for Computational Linguistics, pp. 110.
Ferrés, D., Marimon, M., and Saggion, H., 2015. A web-based text simplification system for english. Procesamiento del Lenguaje Natural 55: 191–4.
Gaizauskas, R., Foster, J., Wilks, Y. Arundel, J., Clough, P., and Piao, S., 2001. The Meter corpus: a corpus for analysing journalistic text reuse. In Proceedings of Corpus Linguistics 2001 Conference, Lancaster, UK: Lancaster University Centre for Computer Corpus Research on Language, pp. 214–23.
Glavas, G., and Stajner, S., 2013. Event-centered simplification of news stories. In Proceedings of the Student Workshop held in conjunction with RANLP 2013, Hissar, Bulgaria, pp. 71–8.
Gonzalez-Dios, I., Aranzabe, M. J., and Díaz de Ilarraza, A., 2018. The corpus of Basque simplified texts (CBST). Language Resources and Evaluation 52 (1): 217–47.
Grover, C., Matheson, C., Mikheev, A., and Moens, M., 2000. LT TTT – a flexible tokenisation tool. In Proceedings of the 2nd International Conference on Language Resources and Evaluation, Athens, Greece, pp. 1147–54.
Hepple, M. 2000. Independence and commitment: assumptions for rapid training and execution of rule-based POS taggers. In Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong: Association for Computational Linguistics, pp. 278–85.
Jay, T. B., 2003. The Psychology of Language. Upper Saddle Rive, NJ: Pearson.
Jelínek, T. 2014. Improvements to dependency parsing using automatic simplification of data. In Proceedings of Language Resources and Evaluation (LREC-2014), Reykjavik, Iceland: European Language Resources Association, pp. 73–7.
Jonnalagadda, S., Tari, L., Hakenberg, J., Baral, C., and Gonzalez, G., 2009. Towards effective sentence simplification for automatic processing of biomedical text. In Proceedings of NAACL HLT 2009: Short Papers, Boulder, Colorado: Association for Computational Linguistics, pp. 177–80.
Kincaid, J. P., Fishburne, R. P., Rogers, R. L., and Chissom, B. S. 1975. Derivation of new readability formulas (Automatic readability index, fog count and flesch reading ease formula) for Navy enlisted personnel. CNTECHTRA Research Branch Report 8-75, CNTECHTRA.
Kintsch, W., and Welsch, D. M. 1991. The construction–integration model: a framework for studying memory for text. In Hockley, W. E., and Lewandowsky, S. (eds.), Relating Theory and Data: Essays on Human Memory, pp. 367–85. NJ, Erlbaum: Hillsdale.
Klerke, S., Goldberg, Y., and Søgaard, A., 2016. Improving sentence compression by learning to predict gaze. In Proceedings of North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2016), San Diego, California: Association for Computational Linguistics, pp. 1528–33.
Kudo, T. 2005. Crf++: yet another crf toolkit. http://crfpp.sourceforge.net.
Lafferty, J., McCallum, A., and Pereira, F. C., 2001. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning, Rümlang, Switzerland: Morgan Kaufmann, pp. 282–9.
Lei, C.-U., Man, K. L., and Ting, T. O. 2014. Using Coh-Metrix to analyse writing skills of students: a case study in a technological common core curriculum course. In Proceedings of the International MultiConference of Engineers and Computer Scientists 2014 Vol II (IMECS 2014), Hong Kong: IMECS, pp. 3–6.
Levenshtein, V. I., 1966. Binary codes capable of correcting deletions and insertions and reversals. Soviet Physics Doklady 10 (8): 707–10.
Maier, W., Kübler, S., Hinrichs, E., and Kriwanek, J., 2012. Annotating coordination in the penn treebank. In Proceedings of the 6th Linguistic Annotation Workshop, Jeju, Republic of Korea: Association for Computational Linguistics, pp. 166–74.
Marcus, M. P., Santorini, B., and Marcinkiewicz, M. A., 1993. Building a large annotated corpus of english: the penn treebank. Computational Linguistics 19 (2): 313–30.
Martos, J., Freire, S., González, A., Gil, D., Evans, R., Jordanova, V., Cerga, A., Shishkova, A., and Orasan, C. 2013. User preferences: Updated. Technical Report D2.2, Deletrea, Madrid, Spain.
Max, A. 2000. Syntactic Simplification – An Application to Text for Aphasic Readers. Mphil in Computer Speech and Language Processing, Wolfson College, University of Cambridge.
McDonald, R. T., and Nivre, J. 2011. Analyzing and integrating dependency parsers. Computational Linguistics, 37 (1): 197230.
McNamara, D. S., Graesser, A. C., McCarthy, P. M., and Cai, Z., 2014. Automated Evaluation of Text and Discourse with Coh-Metrix. Cambridge, UK: Cambridge University Press.
Mishra, K., Soni, A., Sharma, R., and Sharma, D. 2014. Exploring the effects of sentence simplification on Hindi to English machine translation system In Proceedings of the Workshop on Automatic Text Simplification: Methods and Applications in the Multilingual Society, Dublin, Ireland: Association for Computational Linguistics, pp. 21–9.
Miwa, M., Sætre, R., Miyao, Y., and Tsujii, J., 2010. Entity-focused sentence simplification for relation extraction. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), Beijing, China: Association for Computational Linguistics, pp. 788–96.
Narayan, S., and Gardent, C., 2014. Hybrid simplification using deep semantics and machine translation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, Maryland: Association for Computational Linguistics, pp. 435–45.
Ogden, C. K., 1932. Basic English: A General Introduction with Rules and Grammar. London: K. Paul, Trench, Trubner & Co., Ltd.
Paetzold, G. H., and Specia, L., 2013. Text simplification as tree transduction. In Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology, Fortaleza, CE, Brazil: Sociedade Brasileira de Computação, pp. 116–25.
Papineni, K., Roukos, S., Ward, T., and Zhu, W. J., 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting for Computational Linguistics, Philadelphia, Pennsylvania: Association for Computational Linguistics, pp. 311–8.
Quirk, R., Greenbaum, S., Leech, G., and Svartvik, J., 1985. A Comprehensive Grammar of the English Language. Harlow, Essex: Longman.
Rennes, E., and Jönsson, A., 2015. A tool for automatic simplification of Swedish texts. In Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015), Vilnius, Lithuania: LiU Electronic Press, pp. 317–20.
Rindflesch, T. C., Rajan, J. V., and Hunter, L., 2000. Extracting molecular binding relationships from biomedical text. In Proceedings of the 6th Conference on Applied Natural Language Processing, Seattle, Washington: Association of Computational Linguistics, pp. 188–95.
Saggion, H., S̆tajner, S., Bott, S., Mille, S., Rello, L., and Drndarevic, B., 2015. Making it simplext: implementation and evaluation of a text simplification system for Spanish. ACM Transactions on Accessible Computing (TACCESS) – Special Issue on Speech and Language Processing for AT (Part 2) 6 (4): 14:114:36.
Scarton, C., Palmero Aprosio, A., Tonelli, S., Martin-Wanton, T., and Specia, L. 2017. MUSST: a multilingual syntactic simplification tool. In The Companion Volume of the IJCNLP 2017 Proceedings: System Demonstrations, Taipei, Taiwan: AFNLP, pp. 25–8.
Seretan, V., 2012. Acquisition of syntactic simplification rules for French. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey: European Language Resources Association (ELRA), pp. 4019–26.
Sheremetyeva, S., 2014. Automatic text simplification for handling intellectual property (The case of multiple patent claims). In Proceedings of the Workshop on Automatic Text Simplification: Methods and Applications in the Multilingual Society, Dublin, Ireland: Association for Computational Linguistics, pp. 41–52.
Siddharthan, A. 2004. Syntactic Simplification and Text Cohesion. Ph.d. thesis, University of Cambridge.
Siddharthan, A., 2006. Syntactic simplification and text cohesion. Research on Language and Computation 4 (1): 77109.
Siddharthan, A., 2011. Text simplification using typed dependencies: a comparison of the robustness of different generation strategies. In Proceedings of the 13th European Workshop on Natural Language Generation (ENLG ’11), Nancy, France: Association for Computational Linguistics, pp. 211.
Siddharthan, A., and Angrosh, M. A., 2014. Hybrid text simplification using synchronous dependency grammars with hand-written and automatically harvested rules. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden: Association for Computational Linguistics, pp. 722–31.
S̆tajner, S., Calixto, I., and Saggion, H., 2015. Automatic text simplification for Spanish: comparative evaluation of various simplification strategies. In Proceedings of Recent Advances in Natural Language Processing (RANLP-2015), Hissar, Bulgaria, pp. 618–26.
Suter, J., Ebling, S., and Volk, M., 2016. Rule-based automatic text simplification for German. In Proceedings of the 13th Conference on Natural Language Processing (KONVENS 2016), Bochum, Germany: Bochumer Linguistische Arbeitsberichte (BLA), pp. 279–87.
Sutton, C., and McCallum, A., 2011. An introduction to conditional random fields. Foundations and Trends in Machine Learning 4 (4): 267373.
Tomita, M., 1985. Efficient Parsing for Natural Language: A Fast Algorithm for Practical Systems. Norwell, MA, USA: Kluwer Academic Publishers.
Van Delden, S., and Gomez, F., 2002. Combining finite state automata and a greedy learning algorithm to determine the syntactic roles of commas. In Proceedings of the 14th IEEE International Conference on Tools with Artificial Intelligence (ICTAI ’02), Washington, DC, USA: IEEE Computer Society, pp. 293301.
Vickrey, D., and Koller, D., 2008. Sentence simplification for semantic role labeling. In Proceedings of the Association for Computational Linguistics: Human Language Technologies (ACL ’08: HLT), Columbus, Ohio, USA: Association for Computational Linguistics, pp. 344–52.
Vu, T. T., Tran, G. B., and Pham, S. B. 2014. Learning to simplify children stories with limited data. In Nguyen, N. T., Attachoo, B., Trawiski, B., and Somboonviwat, K. (eds.), Intelligent Information and Database Systems (ACIIDS 2014), pp. 3141. Bangkok, Thailand: Springer.
Woodsend, K., and Lapata, M., 2011. Learning to simplify sentences with quasi-synchronous grammar and integer programming. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Edinburgh, Scotland: Association for Computational Linguistics, pp. 409–20.
Wubben, S., van den Bosch, A., and Krahmer, E., 2012. Sentence simplification by monolingual machine translation. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju Island, Korea: Association for Computational Linguistics, pp. 1015–24.
Xu, W., Callison-Burch, C., and Napoles, C., 2015. Problems in current text simplification research: new data can help. Transactions of the Association for Computational Linguistics 3: 283–97.
Xu, W., Napoles, C., Pavlick, E., Chen, Q., and Callison-Burch, C., 2016. Optimizing statistical machine translation for text simplification. Transactions of the Association for Computational Linguistics 4: 401–15.
Yatskar, M., Pang, B., Danescu-Niculescu-Mizil, C., and Lee, L., 2010. For the sake of simplicity: unsupervised extraction of lexical simplifications from wikipedia. In Proceedings of Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL, Los Angeles, California: Association of Computational Linguistics, pp. 365–8.
Zhang, X., and Lapata, M., 2017. Sentence simplification with deep reinforcement learning. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, pp. 584–94.
Zhu, Z., Bernhard, D., and Gurevych, I., 2010. A monolingual tree-based translation model for sentence simplification. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), Beijing, China, pp. 1353–61.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed