Skip to main content Accessibility help
×
Home

Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models

  • ANJA BELZ (a1)

Abstract

Two important recent trends in natural language generation are (i) probabilistic techniques and (ii) comprehensive approaches that move away from traditional strictly modular and sequential models. This paper reports experiments in which pcru – a generation framework that combines probabilistic generation methodology with a comprehensive model of the generation space – was used to semi-automatically create five different versions of a weather forecast generator. The generators were evaluated in terms of output quality, development time and computational efficiency against (i) human forecasters, (ii) a traditional handcrafted pipelined nlg system and (iii) a halogen-style statistical generator. The most striking result is that despite acquiring all decision-making abilities automatically, the best pcru generators produce outputs of high enough quality to be scored more highly by human judges than forecasts written by experts.

Copyright

References

Hide All
Belz, A. (2004) Context-free representational underspecification for NLG. Technical Report ITRI-04-08, Information Technology Research Institute, University of Brighton.
Belz, A. (2005) Statistical generation: three methods compared and evaluated. In Proceedings of the 10th European Workshop on Natural Language Generation (ENLG'05), pp. 15–23.
Belz, A. (2005b) Corpus-driven generation of weather forecasts. In Proceedings of the 3rd Corpus Linguistics Conference (CL'05). http://www.corpus.bham.ac.uk/PCLC
Belz, A. (2006) pCRU: Probabilistic generation using representational underspecification. Technical Report ITRI-06-01, NLTG, CMIS, University of Brighton.
Belz, A. and Reiter, E. (2006) Comparing automatic and human evaluation in NLG. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL'06), pp. 313–320.
Bouayad-Aga, N., Scott, D. and Power, R. (2000) Integrating Content and Style in Documents: a case study of Patient Information Leaflets. Information Design Journal 9 (2–3): 161176.
Briscoe, T. and Carroll, J. (1992) Probabilistic normalisation and unpacking of packed parse forests for unification-based grammars. In Proceedings of the AAAI Fall Symposium on Probabilistic Approaches to Natural Language, pp. 33–38.
Briscoe, T. and Carroll, J. (1993) Generalised probabilistic LR parsing of natural language (corpora) with unification-based grammars. Computational Linguistics 19 (1): 2559.
Charniak, E., Knight, K. and Yamada, K. (2003) Syntax-based language models for machine translation. In Proceedings of the 9th Machine Translation Summit (MT-Summit IX), pp. 40–46.
Copestake, Anne, Dan, Flickinger, Ivan, Sag and Carl, Pollard (2005) Minimal recursion semantics: An introduction Journal of Research on Language and Computation 3 (2–3): 281332.
Doddington, G. (2002) Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In Proceedings of the ARPA Workshop on Human Language Technology, pp. 128–132.
Elhadad, M. and Robin, J. (1996) An Overview of SURGE: A reusable comprehensive syntactic realization component. Technical Report 96-03, Dept of Mathematics and Computer Science, Ben Gurion University, Beer Sheva, Israel.
Gazdar, G. (1996) Paradigm merger in natural language processing. In: Milner, R. and Wand, I. (eds), Cambridge, Computing Tomorrow: Future Research Directions in Computer Science, pp. 88109. Cambridge University Press.
Goldberg, E., Driedger, N. and Kittredge, R. (1994) Using natural-language processing to produce weather forecasts. IEEE Expert 9 (2): 4553.
Habash, N. (2004) The use of a structural n-gram language model in generation-heavy hybrid machine translation. In Belz, A., Evans, R. and Piwek, P. (eds.), Proceedings of the Third International Conference on Natural Language Generation (INLG'04), vol. 3123 of LNAI, pp. 6169. New York: Springer.
Hovy, E. (1988) Generating Natural Language under Pragmatic Constraints. Lawrence Erlbaum.
Huang, L., Knight, K. and Joshi, A. (2006) Statistical syntax-directed translation with extended domain of locality. In Proceedings of the 7th biennial conference of the Association for Machine Translation in the Americas (AMTA'06), pp. 66–73.
Humphreys, K., Calcagno, M. and Weise, D. (2001) Reusing a statistical language model for generation. In Proceedings of the 8th European Workshop on Natural Language Generation (ENLG'01), pp. 86–91.
Isabelle, P. (1984) Machine translation at the TAUM group. In King, M. (ed.), Machine Translation Today: The State of the Art. England: Edinburgh University Press.
Kasper, R. (1989) A flexible interface for linking applications to penman's sentence generator. In Proceedings of the HLT'89 Workshop on Speech and Natural Language, pp. 153–158.
Knight, K. and Langkilde, I. (1998) Generation that exploits corpus-based statistical knowledge. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics (COLING-ACL'98), pp. 704–710.
Langkilde-Geary, I. (2005) An exploratory application of constraint optimization in Mozart to probabilistic natural language processing. In Proceedings of the International Workshop on Constraint Solving and Language Processing (CSLP), LNAI, vol. 3438, pp. 172183. New York: Springer.
Lavoie, B. and Rambow, O. (1997) A fast and portable realizer for text generation systems. In Proceedings of the 5th Conference on Applied Natural Language Processing (ANLP'97), pp. 265–268.
McKeown, K., Kukich, K. and Shaw, J. (1994) Practical issues in automatic documentation generation. In Proceedings of the 3rd Applied Natural Language Processing Conference (ANLP'94), pp. 7–14.
Magerman, D. and Marcus, M. (1991) Pearl: a probabilistic chart parser. In Proceedings of the 2nd International Workshop on Parsing Technologies, pp. 193–199.
Mann, W. and Mathiesen, C. (1983) NIGEL: A systemic grammar for text generation. Technical Report ISI/RR-85-105, Information Sciences Institute.
Mann, W. and Mathiesen, C. (1985) Demonstration of the Nigel text generation computer program. In: Benson, R., and Greaves, J. (eds.), Systemic Perspectives on Discourse: Selected Papers from the 9th International Systemics Workshop, vol. 1, pp. 5083. Norwood, Ablex.
Manning, C. and Schuetze, H. (1999) Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press.
Marciniak, T. and Strube, M. (2005) Discrete optimization as an alternative to sequential processing in NLG. In Proceedings of 10th European Workshop On Natural Language Generation (ENLG'05), pp. 101–108.
Oh, A. and Rudnicky, A. (2000) Stochastic language generation for spoken dialogue systems. In Proceedings of the ANLP-NAACL 2000 Workshop on Conversational Systems, pp. 27–32.
Paiva, D. and Evans, R. (2005) Empirically based control of natural language generation. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pp. 58–65.
Papineni, K., Roukos, S., Ward, T. and Zhu, W.-J. (2002) BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL'05), pp. 311–318.
Power, R. (2000) Planning texts by constraint satisfaction. In Proceedings of the 18th International Conference on Computational Linguistics (COLING'00), pp. 642–648.
Ratnaparkhi, A. (2000) Trainable methods for surface natural language generation. In Proceedings of the 6th Applied Natural Language Processing Conference and the 1st Meeting of the North American Chapter of the Association of Computational Linguistics (ANLP-NAACL'00), pp. 194–201.
Reiter, E. (1994) Has a consensus NL generation architecture appeared and is it psycholinguistically plausible? Proceedings of the 7th International Workshop on Natural Language Generation INLG'94, pp. 163–170.
Reiter, E. and Dale, R. (1997) Building applied natural language generation systems. Natural Language Engineering 3 (1): 5787.
Reiter, E. and Dale, R. (2000) Building Natural Language Generation Systems. Cambridge, Cambridge University Press.
Reiter, E. and Sripada, Y. (2002) Should corpora texts be gold standards for NLG? In Proceedings of the 2nd International Conference on Natural Language Generation (INLG'02), pp. 97–104.
Riezler, S. and Maxwell, J. T. III (2005) On some pitfalls in automatic evaluation and significance testing for MT. In Proceedings of the ACL'05 Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization, pp. 57–64.
Sripada, Y., Reiter, E., Hunter, J. and Yu, J. (2002) SumTime-Meteo: Parallel corpus of naturally occurring forecast texts and weather data. Technical Report AUCS/TR0201, Computing Science Department, University of Aberdeen.
Stolcke, A. (2002) SRILM: An extensible language modeling toolkit. Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP'02), pp. 901–904.
Varges, S. and Mellish, C. (2001) Instance-based natural language generation. In Proceedings of the 2nd Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL'01), pp. 1–8.
White, M. (2004) Reining in CCG chart realization. In Belz, A., Evans, R. and Piwek, P. (eds.), Proceedings of the Third International Conference on Natural Language Generation (INLG'04), vol. 3123 of LNAI, pp. 182191. New York: Springer.

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed