Improved feature decay algorithms for statistical machine translation

Alberto Poncelas; Gideon Maillette de Buy Wenniger; Andy Way

doi:10.1017/S1351324920000467

Improved feature decay algorithms for statistical machine translation

Published online by Cambridge University Press: 22 September 2020

Alberto Poncelas

Gideon Maillette de Buy Wenniger

and

Andy Way

Show author details

Alberto Poncelas*: Affiliation:
ADAPT Centre, Dublin City University, Glasnevin, Dublin 9, Ireland
Gideon Maillette de Buy Wenniger: Affiliation:
ADAPT Centre, Dublin City University, Glasnevin, Dublin 9, Ireland
Andy Way: Affiliation:
ADAPT Centre, Dublin City University, Glasnevin, Dublin 9, Ireland
*: *Corresponding author. E-mail: alberto.poncelas@adaptcentre.ie

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

In machine-learning applications, data selection is of crucial importance if good runtime performance is to be achieved. In a scenario where the test set is accessible when the model is being built, training instances can be selected so they are the most relevant for the test set. Feature Decay Algorithms (FDA) are a technique for data selection that has demonstrated excellent performance in a number of tasks. This method maximizes the diversity of the n-grams in the training set by devaluing those ones that have already been included. We focus on this method to undertake deeper research on how to select better training data instances. We give an overview of FDA and propose improvements in terms of speed and quality. Using German-to-English parallel data, first we create a novel approach that decreases the execution time of FDA when multiple computation units are available. In addition, we obtain improvements on translation quality by extending FDA using information from the parallel corpus that is generally ignored.

Keywords

Machine translation Data selection Statistical methods

Information

Type: Article
Information: Natural Language Engineering , Volume 28 , Issue 1 , January 2022 , pp. 71 - 91

DOI: https://doi.org/10.1017/S1351324920000467 [Opens in a new window]
Copyright: © The Author(s), 2020. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Ambati, V., Vogel, S. and Carbonell, J.G. (2011). Multi-strategy approaches to active learning for statistical machine translation. In Proceedings of the 13th Machine Translation Summit, Xiamen, China. Carnegie Mellon University,pp. 122–129.Google Scholar

Axelrod, A., He, X. and Gao, J. (2011). Domain adaptation via pseudo in-domain data selection. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, Scotland, UK. Association for Computational Linguistics, pp. 355–362.Google Scholar

Banerjee, S. and Lavie, A. (2005). Meteor: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, Michigan. Association for Computational Linguistics, pp. 65–72.Google Scholar

Biçici, E., Liu, Q. and Way, A. (2015). Parfda for fast deployment of accurate statistical machine translation systems, benchmarks, and statistics. In Proceedings of the Tenth Workshop on Statistical Machine Translation, Lisbon, Portugal. Association for Computational Linguistics, pp. 74–78.CrossRef Google Scholar

Biçici, E. and Yuret, D. (2011). Instance selection for machine translation using feature decay algorithms. In Proceedings of the Sixth Workshop on Statistical Machine Translation, Edinburgh, Scotland. Association for Computational Linguistics, pp. 272–283.Google Scholar

Biçici, E. and Yuret, D. (2015). Optimizing instance selection for statistical machine translation with feature decay algorithms. IEEE/ACM Transactions on Audio, Speech, and Language Processing 23(2), 339–350.CrossRef Google Scholar

Bojar, O., Chatterjee, R., Federmann, C., Haddow, B., Huck, M., Hokamp, C., Koehn, P., Logacheva, V., Monz, C., Negri, M., Post, M., Scarton, C., Specia, L. and Turchi, M. (2015). Findings of the 2015 workshop on statistical machine translation. In Proceedings of the Tenth Workshop on Statistical Machine Translation, Lisbon, Portugal. Association for Computational Linguistics, pp. 1–46.CrossRef Google Scholar

Callison-Burch, C., Bannard, C. and Schroeder, J. (2005). Scaling phrase-based statistical machine translation to larger corpora and longer phrases. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Ann Arbor, USA. The Association for Machine Translation in the Americas, pp. 255–262.CrossRef Google Scholar

Clark, J.H., Dyer, C., Lavie, A. and Smith, N.A. (2011). Better hypothesis testing for statistical machine translation: Controlling for optimizer instability. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers), Portland, Oregon. Association for Computational Linguistics,pp. 176–181.Google Scholar

Dean, J. and Ghemawat, S. (2008). Mapreduce: Simplified data processing on large clusters. Communications of the ACM 51(1), 107–113.CrossRef Google Scholar

Doddington, G. (2002). Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In Proceedings of the Second International Conference on Human Language Technology Research, San Diego, CA, pp. 138–145.CrossRef Google Scholar

Eck, M., Vogel, S. and Waibel, A. (2005a). Low cost portability for statistical machine translation based on n-gram coverage. In Proceedings of MT Summit X, Phuket, Thailand. Citeseer, pp. 227–234.Google Scholar

Eck, M., Vogel, S. and Waibel, A. (2005b). Low cost portability for statistical machine translation based on n-gram frequency and TF-IDF. In 2005 International Workshop on Spoken Language Translation, IWSLT, Pittsburgh, PA, USA, pp. 61–67.Google Scholar

Eetemadi, S., Lewis, W., Toutanova, K. and Radha, H. (2015). Survey of data selection methods in statistical machine translation. Machine Translation 29(3–4), 189–223.CrossRef Google Scholar

Freitag, M. and Al-Onaizan, Y. (2016). Fast domain adaptation for neural machine translation. arXiv preprint arXiv:1612.06897.Google Scholar

Gascó, G., Rocha, M.-A., Sanchis-Trilles, G., Andrés-Ferrer, J. and Casacuberta, F. (2012). Does more data always yield better translations? In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France. Association for Computational Linguistics, pp. 152–161.Google Scholar

Germann, U. (2014). Dynamic phrase tables for machine translation in an interactive post-editing scenario. In Proceedings of the Workshop on Interactive and Adaptive Machine Translation, pp. 20–31.Google Scholar

Germann, U. (2015). Sampling phrase tables for the moses statistical machine translation system. The Prague Bulletin of Mathematical Linguistics 104(1), 39–50.CrossRef Google Scholar

Haffari, G., Roy, M. and Sarkar, A. (2009). Active learning for statistical phrase-based machine translation. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Boulder, Colorado. Association for Computational Linguistics, pp. 415–423.CrossRef Google Scholar

Heafield, K. (2011). KenLM: Faster and smaller language model queries. In Proceedings of the Sixth Workshop on Statistical Machine Translation, Edinburgh, Scotland. Association for Computational Linguistics, pp. 187–197.Google Scholar

Hildebrand, A.S., Eck, M., Vogel, S. and Waibel, A. (2005). Adaptation of the translation model for statistical machine translation based on information retrieval. In Proceedings of the 10th Annual Conference of the European Association for Machine Translation, Budapest, Hungary. European Association for Machine Translation, pp. 133–142.Google Scholar

Hoang, C. and Simaan, K. (2014). Latent domain translation models in mix-of-domains haystack. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, Dublin, Ireland. Dublin City University and Association for Computational Linguistics, pp. 1928–1939.Google Scholar

Johnson, H., Martin, J., Foster, G. and Kuhn, R. (2007). Improving translation quality by discarding most of the phrasetable. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic. Association for Computational Linguistics, pp. 967–975.Google Scholar

Khadivi, S. and Ney, H. (2005). Automatic filtering of bilingual corpora for statistical machine translation. In International Conference on Application of Natural Language to Information Systems, Alicante, Spain, pp. 263–274.Google Scholar

Kirchhoff, K. and Bilmes, J. (2014). Submodularity for data selection in machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar. Association for Computational Linguistics, pp. 131–141.CrossRef Google Scholar

Klein, G., Kim, Y., Deng, Y., Senellart, J. and Rush, A.M. (2017). Opennmt: Open-source toolkit for neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics-System Demonstrations, Vancouver, Canada. Association for Computational Linguistics, pp. 67–72.CrossRef Google Scholar

Kneser, R. and Ney, H. (1995). Improved backing-off for m-gram language modeling. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Detroit, MI. IEEE, pp. 181–184.CrossRef Google Scholar

Koehn, P. (2004). Statistical significance tests for machine translation evaluation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, pp. 388–395.Google Scholar

Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R.,Dyer, C., Bojar, O., Constantin, A. and Herbst, E. (2007). Moses: Open source toolkit for SMT. In Proceedings of 45th Annual Meeting of the ACL on Interactive Poster & Demonstration Sessions, Prague, Czech Republic. Association for Computational Linguistics, pp. 177–180.Google Scholar

Lopez, A.D. (2008). Machine Translation by Pattern Matching. PhD Thesis, University of Maryland, College Park, MD, USA.Google Scholar

Luong, M.-T. and Manning, C.D. (2015). Stanford neural machine translation systems for spoken language domains. In Proceedings of the International Workshop on Spoken Language Translation, Da Nang, Vietnam, pp. 76–79.Google Scholar

Manber, U. and Myers, G. (1993). Suffix arrays: A new method for on-line string searches. SIAM Journal on Computing 22(5), 935–948.CrossRef Google Scholar

Mandal, A., Vergyri, D., Wang, W., Zheng, J., Stolcke, A., Tur, G., Hakkani-Tur, D. and Ayan, N.F. (2008). Efficient data selection for machine translation. In Spoken Language Technology Workshop, 2008, Goa, India. IEEE, pp. 261–264.CrossRef Google Scholar

Moore, R.C. and Lewis, W. (2010). Intelligent selection of language model training data. In Proceedings of the ACL 2010 Conference Short Papers, Uppsala, Sweden. Association for Computational Linguistics, pp. 220–224.Google Scholar

Och, F. (2003). Minimum error rate training in statistical machine translation. In ACL-2003: 41st Annual Meeting of the Association for Computational Linguistics, Proceedings, Sapporo, Japan. Association for Computational Linguistics, pp. 160–167.CrossRef Google Scholar

Och, F. and Ney, H. (2003). A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51.CrossRef Google Scholar

Ozdowska, S. and Way, A. (2009). Optimal bilingual data for French-English PB-SMT. In Proceedings of the 13th Annual Meeting of the European Association for Machine Translation, Barcelona, Spain. European Association for Machine Translation, pp. 96–103.Google Scholar

Papineni, K., Roukos, S., Ward, T. and Zhu, W.-J. (2002). Bleu: A method for automatic evaluation of machine translation. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics, pp. 311–318.Google Scholar

Parcheta, Z., Sanchis-Trilles, G. and Casacuberta, F. (2018). Data selection for NMT using infrequent n-gram recovery. In Proceedings of the 21st Annual Conference of the European Association for Machine Translation, Alicante, Spain. European Association for Machine Translation, pp. 219–227.Google Scholar

Poncelas, A. (2019). Improving Transductive Data Selection Algorithms for Machine Translation. PhD Thesis, Dublin City University.Google Scholar

Poncelas, A., de Buy Wenniger, G.M. and Way, A. (2018). Data selection with feature decay algorithms using an approximated target side. In 15th International Workshop on Spoken Language Translation (IWSLT 2018), Bruges, Belgium, pp. 173–180.Google Scholar

Poncelas, A., de Buy Wenniger, G.M. and Way, A. (2019a). Adaptation of machine translation models with back-translated data using transductive data selection methods. In 20th International Conference on Computational Linguistics and Intelligent Text Processing, La Rochelle, France.Google Scholar

Poncelas, A., de Buy Wenniger, G.M. and Way, A. (2019b). Transductive data selection algorithms for fine-tuning neural machine translation. In Proceedings of The 8th Workshop on Patent and Scientific Literature Translation, Dublin, Ireland. European Association for Machine Translation, pp. 13–23.Google Scholar

Poncelas, A., Maillette de Buy Wenniger, G. and Way, A. (2017). Applying n-gram alignment entropy to improve feature decay algorithms. The Prague Bulletin of Mathematical Linguistics 108(1), 245–256.CrossRef Google Scholar

Poncelas, A., Maillette de Buy Wenniger, G. and Way, A. (2018). Feature decay algorithms for neural machine translation. In Proceedings of the 21st Annual Conference of the European Association for Machine Translation, Alicante, Spain. European Association for Machine Translation, pp. 239–248.Google Scholar

Poncelas, A. and Way, A. (2019). Selecting artificially-generated sentences for fine-tuning neural machine translation. In Proceedings of the 12th International Conference on Natural Language Generation, Tokyo, Japan. Association for Computational Linguistics.CrossRef Google Scholar

Poncelas, A., Way, A. and Sarasola, K. (2018). The ADAPT system description for the IWSLT 2018 Basque to English translation task. In International Workshop on Spoken Language Translation, Bruges, Belgium, pp. 72–82.Google Scholar

Poncelas, A., Way, A. and Toral, A. (2016). Extending feature decay algorithms using alignment entropy. In International Workshop on Future and Emerging Trends in Language Technology, Seville, Spain. Springer, pp. 170–182.Google Scholar

Popovic, M. (2015). chrF: Character n-gram F-score for automatic MT evaluation. In Proceedings of the Tenth Workshop on Statistical Machine Translation, Lisbon, Portugal. Association for Computational Linguistics, pp. 392–395.CrossRef Google Scholar

Salton, G. and Yang, C.-S. (1973). On the specification of term values in automatic indexing. Journal of Documentation 29(4), 351–372.CrossRef Google Scholar

Silva, C.C., Liu, C.-H., Poncelas, A. and Way, A. (2018). Extracting in-domain training corpora for neural machine translation using data selection methods. In Proceedings of the Third Conference on Machine Translation: Research Papers, Brussels, Belgium. Association for Computational Linguistics, pp. 224–231.CrossRef Google Scholar

Snover, M., Dorr, B., Schwartz, R., Micciulla, L. and Makhoul, J. (2006). A study of translation edit rate with targeted human annotation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas, Cambridge, Massachusetts, USA. Association for Machine Translation in the Americas, pp. 223–231.Google Scholar

Soto, X., Shterionov, D., Poncelas, A. and Way, A. (2020). Selecting backtranslated data from multiple sources for improved neural machine translation. In Proceedings of The 58th Annual Conference of the Association for Computational Linguistics, ACL, Seattle, USA. Association for Computational Linguistics (accepted).CrossRef Google Scholar

Taghipour, K., Afhami, N., Khadivi, S. and Shiry, S. (2010). A discriminative approach to filter out noisy sentence pairs from bilingual corpora. In Proceedings of 5th International Symposium on Telecommunications (IST 2010), Tehran, Iran. IEEE, pp. 537–541.CrossRef Google Scholar

van der Wees, M., Bisazza, A. and Monz, C. (2017). Dynamic data selection for neural machine translation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark. Association for Computational Linguistics, pp. 1400–1410.CrossRef Google Scholar

Vapnik, V.N. (1998). Statistical Learning Theory. Hoboken, NJ, USA: Wiley-Interscience.Google Scholar

Wang, L., Wong, D.F., Chao, L.S., Lu, Y. and Xing, J. (2014). A systematic comparison of data selection criteria for smt domain adaptation. The Scientific World Journal 2014, 1–10.Google Scholar PubMed

Zens, R., Stanton, D. and Xu, P. (2012). A systematic comparison of phrase table pruning techniques. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea. Association for Computational Linguistics, pp. 972–983.Google Scholar

Article contents

Improved feature decay algorithms for statistical machine translation

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests