Skip to main content Accessibility help
×
Home

Anniversary article: Then and now: 25 years of progress in natural language engineering

  • John Tait (a1) and Yorick Wilks (a2)

Abstract

The paper reviews the state of the art of natural language engineering (NLE) around 1995, when this journal first appeared, and makes a critical comparison with the current state of the art in 2018, as we prepare the 25th Volume. Specifically the then state of the art in parsing, information extraction, chatbots, and dialogue systems, speech processing and machine translation are briefly reviewed. The emergence in the 1980s and 1990s of machine learning (ML) and statistical methods (SM) is noted. Important trends and areas of progress in the subsequent years are identified. In particular, the move to the use of n-grams or skip grams and/or chunking with part of speech tagging and away from whole sentence parsing is noted, as is the increasing dominance of SM and ML. Some outstanding issues which merit further research are briefly pointed out, including metaphor processing and the ethical implications of NLE.

Copyright

Corresponding author

*Corresponding author. Email: john@johntait.net

References

Hide All
Andor, D. Alberti, C., Weiss, D., Severyn, A., Presta, A., Ganchev, K., Petrov, S. and Collins, M. (2016). Globally Normalized Transition-Based Neural Networks. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany. 24422452.
Azmi, A. and Alshenaifi, N. (2017). Lemaza: An Arabic why-question answering system. Natural Language Engineering 23(6), 877903. doi: 10.1017/S1351324917000304
Bachenko, J., Fitzpatrick, E. and Daugherty, J. (1995). A rule-based phrase parser for real-time text-to-speech synthesis. Natural Language Engineering 1(2), 191212. doi: 10.1017/S1351324900000140
Ballim, A. and Wilks, Y. (1991/2018). Artificial Believers: The Ascription of Belief. New Jersey: Ablex Books; reprinted by Routledge, London.
Banea, C. and Mihalcea, R. (2018). Possession identification in text. Natural Language Engineering 24(4), 589610. doi: 10.1017/S1351324918000062
Biemann, C., Faralli, S., Panchenko, A. and Ponzetto, S. (2018). A framework for enriching lexical semantic resources with distributional semantics. Natural Language Engineering 24(2), 265312. doi: 10.1017/S135132491700047X
Boguraev, B. and Briscoe, T. (Eds) (1989). Computional Lexicography for Natural Language Processing. Harlow, Essex, England: Longman.
Boguraev, B.K., Garigliano, R. and Tait, J.I. (1995). Editorial. Natural Language Engineering 1(1), 17.
Boguraev, B., Carroll, J., Briscoe, E., Carter, D. and Grover, C. (1987). The Derivation of a Grammatically-Indexed Lexicon from the Longman Dictionary of Contemporary English. In Proceedings of the 25th Annual Meeting of the Association for Computational Linguistics, Stanford, CA. 193200.
Bond, F. and Paik, K. (2012). A survey of wordnets and their licenses. In Proceedings of the 6th Global WordNet Conference (GWC 2012). Matsue. 64–71
Braun, D., Reiter, E. and Siddharthan, A. (2018). SaferDrive: An NLG-based behaviour change support system for drivers. Natural Language Engineering 24(4), 551588. doi: 10.1017/S1351324918000050
Brown, J.C. (1995). High speed feature unification and parsing. Natural Language Engineering 1(4), 309338.
Callison-Burch, C., Osborne, M., Koehn, P. (2006). Re-evaluation the Role of Bleu in Machine Translation Research. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006). Trento, Italy. 249256.
Chelba, C. and Jelinek, F. (2000) Structured language modeling. Computer Speech & Language 14(4), 283332. doi: 10.1006/csla.2000.0147
Chen, Y., Zheng, Q., Tian, F., Liu, H., Hao, Y. and Shah, N. (2018). Exploring open information via event network. Natural Language Engineering 24(2), 199220. doi: 10.1017/S1351324917000390
Cho, K. (2018). Deep learning. In Mitkov, R. (ed), The Oxford Handbook of Computational Linguistics, 2nd Edition. Oxford, England: Oxford University Press. doi: 10.1093/oxfordhb/9780199573691.013.55
Choi, E., Seo, M., Chen, D., Jia, R. and Berant, J. (2018). Proceedings of the Workshop on Machine Reading for Question Answering. Melbourne, Australia: Association for Computational Linguistics.
Church, K.W. and Gale, W.A. (1995). Poisson mixtures. Natural Language Engineering 1(4), 163190.
Colby, K.M. (1973). Simulation of Belief Systems. In Schank, R.C. and Colby, K.M. (eds), Computer Models of Thought and Language. San Francisco: W.H. Freeman and Co. 251286.
Cranias, L., Papageorgiou, H. and Piperidis, S. (1997). Example retrieval from a translation memory. Natural Language Engineering 3(4), 255277
Cunningham, H. (1999). A definition and short history of language engineering. Natural Language Engineering 5(1), 116.
De Jong, G.F. (1982). An overview of the FRUMP system. In Lehnert, W.G. and Ringle, M.H. (eds), Strategies for Natural Language Processing. Hillsdale, NJ: Lawrence Erlbaum Associates.
Derici, C., Aydin, Y., Yenialaca, Ç, Aydin, N., Kartal, G., Özgür, A. and Güngör, T. (2018). A closed-domain question answering framework using reliable resources to assist students. Natural Language Engineering 24(5), 725762. doi: 10.1017/S1351324918000141
Evans, R., Gaizauskas, R., Cahill, L.J., Walker, J., Richardson, J. and Dixon, A. (1995). POETIC: A system for gathering and disseminating traffic information. Natural Language Engineering 1(4), 363387.
Fatima, M., Anwar, S., Naveed, A., Arshad, W., Nawab, R., Iqbal, M. and Masood, A. (2018). Multilingual SMS-based author profiling: Data and methods. Natural Language Engineering 24(5), 695724. doi: 10.1017/S1351324918000244
Fellbaum, C. and Miller, G.A. (1998). Wordnet: An Electronic Lexical Database. Cambridge, MA: MIT Press.
Floridi, L., Taddeo, M. and Turilli, M. (2009). Turing’s imitation game: Still an impossible challenge for all machines and some judges—an evaluation of the 2008 Loebner contest. Minds & Machines (19):145150. doi: 10.1007/s11023-008-9130-6.
Friedman, C., Hripcsak, G., DuMouchel, W., Johnson, S.B. and Clayton, P.D. (1995). Natural language processing in an operational clinical information system. Natural Language Engineering 1(1), 83108.
Garcia, M., Gómez-Rodríguez, C. and Alonso, M. (2018). New treebank or repurposed? On the feasibility of cross-lingual parsing of Romance languages with Universal dependencies. Natural Language Engineering 24(1), 91122. doi: 10.1017/S1351324917000377
Garside, R. (1987). The CLAWS Word-tagging System. In Garside, R., Leech, G. and Sampson, G. (eds), The Computational Analysis of English: A Corpus-Based Approach. London: Longman.
Giannella, C., Winder, R. and Petersen, S. (2017). Dropped personal pronoun recovery in Chinese SMS. Natural Language Engineering 23(6), 905927. doi: 10.1017/S1351324917000158
Grishman, R. and Sundheim, B. (1996). Message Understanding Conference - 6: A Brief History. In Proceedings of the 16th International Conference on Computational Linguistics (COLING), I, Copenhagen, 466471.
Gründer-Fahrer, S., Schlaf, A., Wiedemann, G. and Heyer, G. (2018). Topics and topical phases in German social media communication during a disaster. Natural Language Engineering 24(2), 221264. doi: 10.1017/S1351324918000025
Han, Y.S. and Choi, K.-S. (1995). Best parse parsing with Earley’s and Inside algorithms on probabilistic RTN. Natural Language Engineering 1(2), 147161.
Hirano, D., Tanaka-Ishii, K. and Finch, A. (2018). Extraction of templates from phrases using Sequence Binary Decision Diagrams. Natural Language Engineering 24(5), 763795. doi: 10.1017/S1351324918000268
Hutchins, J. and Somers, H. (1992). An Introduction to Machine Translation. London: Academic Press.
Juang, B.H. and Rabiner, L.R. (2005). Automatic Speech Recognition– A Brief History of the Technology Development. Georgia Institute of Technology, Atlanta. https://web.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/354_LALI-ASRHistory-final-10-8.pdf (Checked 10 December 2018)
Justeson, J. and Katz, S. (1995). Technical terminology: Some linguistic properties and an algorithm for identification in text. Natural Language Engineering 1(1), 927. doi: 10.1017/S1351324900000048
Kadari, R., Zhang, Y., Zhang, W. and Liu, T. (2018). CCG supertagging with bidirectional long short-term memory networks. Natural Language Engineering 24(1), 7790. doi: 10.1017/S1351324917000250
Krüger, K., Lukowiak, A., Sonntag, J., Warzecha, S. and Stede, M. (2017). Classifying news versus opinions in newspapers: Linguistic features for domain independence. Natural Language Engineering 23(5), 687707. doi: 10.1017/S1351324917000043
Kübler, S., Liu, C. and Sayyed, Z. (2018). To use or not to use: Feature selection for sentiment analysis of highly imbalanced data. Natural Language Engineering 24(1), 337. doi: 10.1017/S1351324917000298
Laddha, A. and Mukherjee, A. (2018). Aspect opinion expression and rating prediction via LDA-CRF hybrid. Natural Language Engineering 24(4), 611639. doi: 10.1017/S135132491800013X
Langlois, D., Saad, M. and Smaliki, K. (2018). Alignment of comparable documents: Comparison of similarity measures on French–English–Arabic data. Natural Language Engineering 24(5), 677694. doi: 10.1017/S1351324918000232
Läubli, S. and Orrego-Carmona, D. (2017). When Google Translate is better than Some Human Colleagues, those People are no longer Colleagues. In Proceedings of Translation and the Computer 39, Asling, the International Association for Advancement in Language Technology, London. 5969.
Li, B., Gaussier, E. and Yang, D. (2018). Measuring bilingual corpus comparability. Natural Language Engineering 24(4), 523549. doi: 10.1017/S1351324917000481
MacKay, D.J.C. and Bauman Peto, L.C. (1995). A hierarchical Dirichlet language model. Natural Language Engineering 1(3), 289307.
Manning, C.D. (2015). Computational linguistics and deep learning. Computational Linguistics 41(4), 701707.
Marcus, M.P., Marcinkiewicz, M.A. and Santorini, B. (1993) Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19(2), 313330.
Marrero, M. and Urbano, J. (2018). A semi-automatic and low-cost method to learn patterns for named entity recognition. Natural Language Engineering 24(1), 3975. doi: 10.1017/S135132491700016X
Michiels, A. (1983). Automatic analysis of texts. In Jones, K.P. (ed), Informatics 7: Intelligent Information Retrieval. Cambridge: Aslib, pp. 103120.
Mikheev, A. and Liubushkina, L. (1995). Russian morphology: An engineering approach. Natural Language Engineering 1(3), 235260. doi: 10.1017/S135132490000019X
Nagao, M. (1984). A framework of a mechanical translation between Japanese and English by analogy principle. In Elithorn, A. and Banerji, R. (eds), Artificial and Human Intelligence. Edited Review Papers Presented at the International NATO Symposium on Artificial and Human Intelligence, 1981. Lyon, Amsterdam, New York, Oxford, North Holland, pp. 173–180.
Oakley, B. (1993). EUROTRA final Review Panel Report. Commission of the European Communities. Available from: http://aei.pitt.edu/36888/1/A2903.pdf (Checked 26 January 2019).
Palmer, M. and Finin, T. (1990). Workshop on the evaluation of natural language processing systems. Computational Linguistics 16(3), 175181.
Papenini, K., Rouskos, S., Ward, T. and Whu, W.-J. (2002). BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia. 311318.
Periñan-Pascual, C. (2018). DEXTER: A workbench for automatic term extraction with specialized corpora. Natural Language Engineering 24(2), 163198. doi: 10.1017/S1351324917000365
Pierce, J.R., Carroll, J.B., Hamp, E.P., Hays, D.G., Hockett, C.F., Oettinger, A.G. and Perlis, A. (1966). Language and Machines — Computers in Translation and Linguistics. Washington, DC: ALPAC report, National Academy of Sciences, National Research Council.
Prince, V. and Pernel, D. (1995). Several knowledge models and a blackboard memory for human-machine robust dialogues. Natural Language Engineering 1(20), 113145.
Proctor, P. (ed.) (1978). Longman Dictionary of Contemporary English. Harlow, Essex: Longman Group.
Pulman, S. (1995). Anaphora and ellipsis in artificial languages. Natural Language Engineering 1(3), 217234. doi: 10.1017/S1351324900000188
Rosenbaum, R. and Lochak, D. (1966). The IBM core grammar of English. In Lieberman, D. (ed), Specification and Utilization of a Transformational Grammar. AFCRL-66-270 (1966). Yorktown Heights, New York: Thomas J. Watson Research Center, IBM Corporation.
Schank, R.C. and Colby, K.M. (Eds.) (1973). Computer Models of Thought and Language. San Francisco: W.H. Freeman and Co.
Somers, H. (2003). Translation memory. In Somers, H. (ed), Computers and Translation: A Translator’s Guide. Amsterdam: John Benjamins.
Sparck Jones, K. (1986). Synonymy and Semantic Classification. Edinburgh: Edinburgh University Press.
Sparck Jones, K. and Galliers, J.R. (1995). Evaluating Natural Language Processing Systems: An Analysis and Review. Berlin: Springer.
Tait, J. (2019). Editorial. Natural Language Engineering 25(1), 14.
Tait, J.I. (ed). (2005). Charting a New Course: Natural Language Processing and Information Retrieval. Dordrecht, NL: Springer.
Thompson, H. (1983). Natural language processing: A critical analysis of the structure of the field, with some implications for parsing. In Sparck Jones, K. and Wilks, Y. (eds), Automatic Natural Language Parsing. Chichester, England: Ellis Horwood.
Wei, Y., Wei, J. and Yang, Z. (2018). Unsupervised learning of semantic representation for documents with the law of total probability. Natural Language Engineering 24(4), 491522. doi: 10.1017/S1351324917000420
Weizenbaum, J. (1966). ELIZA—a computer program for the study of natural language communication between man and machine. Communications of the ACM 9, 3645. doi: 10.1145/365153.365168.
Wilks, Y. (1967). Text searching with templates. Cambridge language research unit, research memorandum. In Ahmad, K., Brewster, C., Stevenson, M. (eds), Words and Intelligence I. Text, Speech and Language Technology, vol. 35. Dordrecht: Springer. Reprinted (2007).
Wilks, Y.A., Slator, B.M. and Guthrie, L.M. (1996). Electric Words. Cambridge, Mass: MIT Press.
Wilks, Y.A. and Tait, J.I. (2005). A retrospective view of synonymy and semantic classification. In Charting a New Course: Natural Language Processing and Information Retrieval, pp. 111. Springer, Dordrecht.
Winograd, T. (1973). A procedural model of language understanding. In Schank, R.C. and Colby, K.M. (eds), (1973). Computer Models of Thought and Language. San Francisco: W.H. Freeman and Co. pp. 152186.
Wintner, S. and Ornan, U. (1995). Syntactic analysis of Hebrew sentences. Natural Language Engineering 1(3), 261288. doi: 10.1017/S1351324900000206

Keywords

Anniversary article: Then and now: 25 years of progress in natural language engineering

  • John Tait (a1) and Yorick Wilks (a2)

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed