Skip to main content
×
Home

Find the errors, get the better: Enhancing machine translation via word confidence estimation

  • NGOC-QUANG LUONG (a1), LAURENT BESACIER (a1) and BENJAMIN LECOUTEUX (a1)
Abstract
Abstract

This paper presents two novel ideas of improving the Machine Translation (MT) quality by applying the word-level quality prediction for the second pass of decoding. In this manner, the word scores estimated by word confidence estimation systems help to reconsider the MT hypotheses for selecting a better candidate rather than accepting the current sub-optimal one. In the first attempt, the selection scope is limited to the MT N-best list, in which our proposed re-ranking features are combined with those of the decoder for re-scoring. Then, the search space is enlarged over the entire search graph, storing many more hypotheses generated during the first pass of decoding. Over all paths containing words of the N-best list, we propose an algorithm to strengthen or weaken them depending on the estimated word quality. In both methods, the highest score candidate after the search becomes the official translation. The results obtained show that both approaches advance the MT quality over the one-pass baseline, and the search graph re-decoding achieves more gains (in BLEU score) than N-best List Re-ranking method.

Copyright
References
Hide All
Aziz W., De Sousa S. C. M., and Specia L. 2012. Pet: a tool for post-editing and assessing machine translation. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey.
Bicici E. 2013. Referential translation machines for quality estimation. In Proceedings of the Eighth Workshop on Statistical Machine Translation, Sofia, Bulgaria.
Blackwood G. 2010. Lattice Rescoring Methods for Statistical Machine Translation. PhD Thesis, University of Cambridge, Cambridge, England.
Blatz J., Fitzgerald E., Foster G., Gandrabur S., Goutte C., Kulesza A., Sanchis A., and Ueffing N. 2003. Confidence estimation for machine translation. Technical Report, JHU/CLSP Summer Workshop.
Blatz J., Fitzgerald E., Foster G., Gandrabur S., Goutte C., Kulesza A., Sanchis A., and Ueffing N. 2004. Confidence estimation for machine translation. In Proceedings of COLING 2004, Geneva.
Camargo-de-Souza J. G., González-Rubio J., Buck C., Turchi M., and Negri M. 2014. Fbk-upv-uedin participation in the wmt14 quality estimation shared-task. In Proceedings of the 9th Workshop on Statistical Machine Translation, Baltimore, Maryland, USA.
Capit N., and Joseph E. 2013. OAR Documentation - User Guide. LIG laboratory, Laboratoire d’Informatique de Grenoble, France.
Clark J., Dyer C., Lavie A., and Smith N., 2011. Better hypothesis testing for statistical machine translation: controlling for optimizer instability. In Proceedings of the Association for Computational Lingustics, Portland, Oregon, USA, pp. 176181.
Duh K., and Kirchhoff K., 2008. Beyond log-linear models: boosted minimum error rate training for n-best re-ranking. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (Short Papers), Columbus, Ohio, USA, pp. 3740.
Felice M., and Specia L. 2012. Linguistic features for quality estimation. In Proceedings of the 7th Workshop on Statistical Machine Translation, Montreal, Canada.
Frank V. B. 2004. CONDOR: A Constrained, Non-Linear, Derivative-Free Parallel Optimizer for Continuous, High Computing Load, Noisy Objective Functions. PhD Thesis, University of Brussels (ULB - Université Libre de Bruxelles), Belgium.
Han A. L. F., Lu J., Wong D. F., Chao L. S., He L., and Xing J. 2013. Quality estimation for machine translation using the joint method of evaluation criteria and statistical modeling. In Proceedings of the 8th Workshop on Statistical Machine Translation, Sofia, Bulgaria.
Kirchhoff K., and Yang M. 2005. Improved language modeling for statistical machine translation. In Proceedings of the ACL Workshop on Building and Using Parallel Texts, Ann Arbor, Michigan.
Koehn P., Hoang H., Birch A., Callison-Burch C., Federico M., Bertoldi N., Cowan B., Shen W., Moran C., Zens R., Dyer C., Bojar O., Constantin A., and Herbst E. 2007. Moses: open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech Republic.
Kreutzer J., Schamoni S., and Riezler S. 2015. QUality Estimation from ScraTCH (QUETCH): deep learning for word-level translation quality estimation. In Proceedings of the 10th Workshop on Statistical Machine Translation, Lisboa, Portugal. Association for Computational Linguistics.
Lafferty J., McCallum A., and Pereira F. 2001. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning, San Francisco, CA.
Lavergne T., Cappé O., and Yvon F. 2010. Practical very large scale CRFs. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden.
Logacheva V., Hokamp C., and Specia L. 2015. Data enhancement and selection strategies for the word-level quality estimation. In Proceedings of the 10th Workshop on Statistical Machine Translation, Lisboa, Portugal. Association for Computational Linguistics.
Luong N. Q. 2012. Integrating lexical, syntactic and system-based features to improve word confidence estimation in SMT. In Proceedings of JEP-TALN-RECITAL, Grenoble, France.
Luong N. Q., Besacier L., and Lecouteux B. 2013. Word confidence estimation and its integration in sentence quality estimation for machine translation. In Proceedings of The 5th International Conference on Knowledge and Systems Engineering, Hanoi, Vietnam.
Luong N. Q., Besacier L., and Lecouteux B. 2014. LIG System for word level WE task at WMT14. In Proceedings of the 9th Workshop on Statistical Machine Translation, Baltimore, Maryland, USA.
Luong N. Q., Lecouteux B., and Besacier L. 2013. LIG system for WMT13 QE task: investigating the usefulness of features inWord confidence estimation for MT. In Proceedings of the 8th Workshop on Statistical Machine Translation, Sofia, Bulgaria.
Nakov P., Guzman F., and Vogel S. 2012. Optimizing for sentence-level bleu+1 yields short translations. In Proceedings of COLING 2012, Mumbai, India.
Nguyen B., Huang F., and Al-Onaizan Y. 2011. Goodness: a method for measuring machine translation confidence. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon.
Och F. J. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan.
Papineni K., Roukos S., Ard T., and Zhu W. J. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA.
Potet M., Rodier E. E., Besacier L., and Blanchon H. 2012. Collection of a large database of French-English SMT output corrections. In Proceedings of the 8th International Conference on Language Resources and Evaluation, Istanbul.
Shah K., Logacheva V., Paetzold G., Blain F., Beck D., Bougares F., and Specia L. 2015. SHEF-NN: translation quality estimation with neural networks. In Proceedings of the 10th Workshop on Statistical Machine Translation, Lisboa, Portugal. Association for Computational Linguistics.
Shang L., Cai D., and Ji D. 2015. Strategy- based technology for estimating MT quality. In Proceedings of the 10th Workshop on Statistical Machine Translation, Lisboa, Portugal. Association for Computational Linguistics.
Snover M., Madnani N., Dorr B., and Schwartz R. 2008. Terp system description. In MetricsMATR Workshop at the Conference of the Association for Machine Translation in the Americas (AMTA), Honolulu, Hawaii, USA.
Sokolov A., Wisniewski G., and Yvon F., 2012a. Computing lattice bleu oracle scores for machine translation. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, pp. 120129.
Sokolov A., Wisniewski G., and Yvon F. 2012b. Non-linear n-best list reranking with few features. In Proceedings of the Conference of the Association for Machine Translation in the Americas (AMTA), San Diego, CA, USA.
Soricut R., and Echihabi A. 2010. Trustrank: inducing trust in automatic translations via ranking. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden.
Stolcke A. 2002. Srilm - an extensible language modeling toolkit. In Proceedings of the 7th International Conference on Spoken Language Processing, Denver, USA.
Tezcan A., Hoste V., Desmet B., and Macken L. 2015. UGENT-LT3 SCATE system for machine translation quality estimation. In Proceedings of the 10th Workshop on Statistical Machine Translation, Lisboa, Portugal. Association for Computational Linguistics.
Ueffing N., Macherey K., and Ney H. 2003. Confidence measures for statistical machine translation. In MT Summit IX, New Orleans, LA.
Ueffing N., and Ney H. 2005. Word-level confidence estimation for machine translation using phrased-based translation models. In Human Language Technology Conference and Conference on Empirical Methods in NLP, Vancouver.
Ueffing N., and Ney H., 2007. Word-level confidence estimation for machine translation. Computational Linguistics 33 (1): 940.
Watanabe T., Suzuki., Tsukada H., and Isozaki H. 2007. Online large-margin training for statistical machine translation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic.
Wisniewski G., Pécheux N., Allauzen A., and Yvon F. 2014. Limsi submission for wmt’14 qe task. In Proceedings of the 9th Workshop on Statistical Machine Translation, Baltimore, Maryland, USA.
Xiong D., Zhang M., and Li H. 2010. Error detection for statistical machine translation using linguistic features. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden.
Zhang Y., Almut S. H., and Stephan V. 2006. Distributed language modeling for n-best list re-ranking. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006), Sydney.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Natural Language Engineering
  • ISSN: 1351-3249
  • EISSN: 1469-8110
  • URL: /core/journals/natural-language-engineering
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Metrics

Full text views

Total number of HTML views: 6
Total number of PDF views: 49 *
Loading metrics...

Abstract views

Total abstract views: 255 *
Loading metrics...

* Views captured on Cambridge Core between 7th March 2017 - 24th November 2017. This data will be updated every 24 hours.