Skip to main content Accessibility help
×
Home

Estimating word-level quality of statistical machine translation output using monolingual information alone

  • Arda Tezcan (a1), Véronique Hoste (a1) and Lieve Macken (a1)

Abstract

Various studies show that statistical machine translation (SMT) systems suffer from fluency errors, especially in the form of grammatical errors and errors related to idiomatic word choices. In this study, we investigate the effectiveness of using monolingual information contained in the machine-translated text to estimate word-level quality of SMT output. We propose a recurrent neural network architecture which uses morpho-syntactic features and word embeddings as word representations within surface and syntactic n-grams. We test the proposed method on two language pairs and for two tasks, namely detecting fluency errors and predicting overall post-editing effort. Our results show that this method is effective for capturing all types of fluency errors at once. Moreover, on the task of predicting post-editing effort, while solely relying on monolingual information, it achieves on-par results with the state-of-the-art quality estimation systems which use both bilingual and monolingual information.

Copyright

Corresponding author

*Corresponding author. Email: arda.tezcan@ugent.be

References

Hide All
Abadi, M., et al. (2016). Tensorflow: Large-Scale machine learning on heterogeneous distributed systems. In CoRR, abs/1603.04467.
Abdelsalam, A., Bojar, O. and El-Beltagy, S. (2016). Bilingual embeddings and word alignments for translation quality estimation. In Proceedings of the First Conference on Machine Translation. Berlin, Germany: Association for Computational Linguistics, pp. 764771.
Anastasakos, T., Kim, Y.-B. and Deoras, A. (2014). Task specific continuous word representations for mono and multilingual spoken language understanding. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3246–3250
Avraham, O. and Goldberg, Y. (2017). The interplay of semantics and morphology in word embeddings. In CoRR, abs/1704.01938. Retrieved from http://arxiv.org/abs/1704.01938
Avramidis, E. (2017). Comparative quality estimation for machine translation observations on machine learning and features. The Prague Bulletin of Mathematical Linguistics 108(1), 307318.
Axelrod, A., He, X. and Gao, J. (2011). Domain adaptation via pseudo in-domain data selection. In Proceedings of the conference on empirical methods in natural language processing (pp. 355362). Stroudsburg, PA, USA: Association for Computational Linguistics. Retrieved from http://dl.acm.org/citation.cfm?id=2145432.2145474
Bahdanau, D., Cho, K. and Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. In CoRR, abs/1409.0473. Retrieved from http://arxiv.org/abs/1409.0473
Bentivogli, L., Bisazza, A., Cettolo, M. and Federico, M. (2016). Neural versus phrase-based machine translation quality: A case study. In CoRR, abs/1608.04631.
Bertoldi, N. and Federico, M. (2009). Domain adaptation for statistical machine translation with monolingual resources. In Proceedings of the Fourth Workshop on Statistical Machine Translation. Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 182189. Retrieved from http://dl.acm.org/citation.cfm?id=1626431.1626468
Blain, F., Scarton, C. and Specia, L. (2017). Bilexical embeddings for quality estimation. In Proceedings of the Second Conference on Machine Translation, pp. 545–550.
Blatz, J., et al. (2004). Confidence estimation for machine translation. In Proceedings of the 20th International Conference on Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics. Retrieved from https://doi.org/10.3115/1220355.1220401
Bohnet, B. and Nivre, J. (2012). A transition-based system for joint part-of speech tagging and labeled non-projective dependency parsing. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, pp. 1455–1465.
Bojar, O., et al. (2014). Findings of the 2014 workshop on statisticalmachine translation. In Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 12–58.
Bojar, O., et al. (2015). Findings of the 2015 workshop on statisticalmachine translation. In Proceedings of the Tenth Workshop on Statistical Machine Translation. Lisbon, Portugal: Association for Computational Linguistics, pp. 146. Retrieved from http://aclweb.org/anthology/W15-3001
Bojar, O., et al. (2016). Findings of the 2016 conference on machine translation. In Proceedings of the Frst Conference on Machine Translation, WMT 2016, Colocated with ACL 2016, Berlin, Germany, pp. 131198.
Bojar, O., et al. (2017). Findings of the 2017 conference on machine translation (WMT17). In Proceedings of the Second Conference onMachine Translation, Volume 2: Shared Task Papers. Copenhagen, Denmark: Association for Computational Linguistics, pp. 169214.
Castilho, S., Moorkens, J., Gaspari, F., Calixto, I., Tinsley, J. and Way, A. (2017). Is neural machine translation the new state of the art? The Prague Bulletin of Mathematical Linguistics 108(1), 109120.
Cho, K., van Merriënboer, B., Gülçehre, Ç., Bahdanau, D., Bougares, F., Schwenk, H. and Bengio, Y. (2014). Learning phrase representations using rnn encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, pp. 1724–1734.
Chung, J., Gülçehre, Ç., Cho, K. and Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. In CoRR, abs/1412.3555. Retrieved from http://arxiv.org/abs/1412.3555
Costa, Â.,Ling, W., Luıs, T., Correia, R. and Coheur, L. (2015). A linguistically motivated taxonomy for machine translation error analysis. Machine Translation 29(2), 127161.
Daems, J., Macken, L. and Vandepitte, S. (2014). On the origin of errors: A finegrained analysis of mt and pe errors and their relationship. In Proceedings of the International Conference on Language Resources and Evaluation (LREC). European Language Resources Association (ELRA), pp. 62–66.
Daems, J., Vandepitte, S., Hartsuiker, R.J. and Macken, L. (2017). Identifying the machine translation error types with the greatest impact on post-editing effort. Frontiers in Psychology 8, 1282. http://journal.frontiersin.org/article/10.3389/fpsyg.2017.01282
de Almeida, G. (2013). Translating the post-editor: An investigation of post-editing changes and correlations with professional experience across two romance languages (Unpublished doctoral dissertation). Dublin City University.
Gandrabur, S. and Foster, G. (2003). Confidence estimation for translation prediction. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 - Volume 4. Association for Computational Linguistics, pp. 95–102.
Glorot, X. and Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS’10). Society for Artificial Intelligence and Statistics.
Graham, Y., Baldwin, T., Moffat, A. and Zobel, J. (2014). Is machine translation getting better over time? In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 443–451.
Hokamp, C. (2017). Ensembling factored neural machine translation models for automatic post-editing and quality estimation. In CoRR, abs/1706.05083.
Hokamp, C., Calixto, I., Wagner, J. and Zhang, J. (2014). Target-centric features for translation quality estimation. In Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 329–334.
Jones, K.S. and Galliers, J.R. (1995). Evaluating Natural Language Processing Systems: An Analysis and Review, vol. 1083. Germany: Springer Science & Business Media.
Junczys-Dowmunt, M. and Grundkiewicz, R. (2016). Log-linear combinations of monolingual and bilingual neural machine translation models for automatic post-editing. In CoRR, abs/1605.04800.
Kim, H. and Lee, J.-H. (2016). Recurrent neural network based translation quality estimation. In Proceedings of the first conference on machine translation: Volume 2, shared task papers, pp. 787–792.
Kim, H., Lee, J.-H. and Na, S.-H. (2017). Predictor-estimator using multilevel task learning with stack propagation for neural quality estimation. In Proceedings of the Second Conference on Machine Translation, pp. 562–568.
Klubička, F., Toral, A. and Sánchez-Cartagena, V.M. (2017). Fine-grained human evaluation of neural versus phrase-based machine translation. The Prague Bulletin of Mathematical Linguistics 108(1), 121132.
Koponen, M., Aziz, W., Ramos, L. and Specia, L. (2012). Post-editing time as a measure of cognitive effort. In AMTA 2012 Workshop on Post-Editing Technology and Practice (WPTP 2012). San Diego, USA, pp. 1120.
Kreutzer, J., Schamoni, S. and Riezler, S. (2015). QUality Estimation from ScraTCH(QUETCH): Deep learning for word-level translation quality estimation. In Proceedings of the Tenth Workshop on Statistical Machine Translation, WMT@EMNLP 2015, Lisbon, Portugal, pp. 316322.
Kusner, M., Sun, Y., Kolkin, N. and Weinberger, K.Q. (2015). From word embeddings to document distances. In Blei, D. and & Bach, F. (eds), Proceedings of the 32nd International Conference on Machine Learning (ICML-15). JMLR Workshop and Conference Proceedings, pp. 957–966.
Li, J., Li, J., Fu, X., Masud, M. and Huang, J.Z. (2016). Learning distributed word representation with multicontextual mixed embedding. Knowledge-Based Systems 106, 220230. http://www.sciencedirect.com/science/article/pii/S0950705116301435; doi: http://dx.doi.org/10.1016/j.knosys.2016.05.045
Logacheva, V., Hokamp, C. and Specia, L. (2016a). Marmot: A toolkit for translation quality estimation at the word level. In Proceedings of the 10th Edition of the Language Resources and Evaluation Conference (LREC).
Logacheva, V., Lukasik, M. and Specia, L. (2016b). Metrics for evaluation of word-levelmachine translation quality estimation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (volume 2: Short papers), pp. 585–590.
Lommel, A.R., Uszkoreit, H. and Burchardt, A. (2014). Multidimensional Quality Metrics (MQM). Tradumàtica 12, 455463.
Ma, W. and McKeown, K. (2012). Detecting and correcting syntactic errors in machine translation using feature-based lexicalized tree adjoining grammars. IJCLCLP 17(4), pp. 114.
Macken, L., De Clercq, O. and Paulussen, H. (2011). Dutch parallel corpus: A balanced copyright-cleared parallel corpus. Meta: Journal des traducteursMeta:/ Translators’ Journal 56(2), 374390.
Martins, A.F., Astudillo, R.F., Hokamp, C. and Kepler, F. (2016). Unbabel’s participation in the WMT16 word-level translation quality estimation shared task. In Proceedings of the First Conference on Machine Translation. Berlin, Germany: Association for Computational Linguistics, pp. 806811.
Martins, A.F., Kepler, F. and Monteiro, J. (2017). Unbabel’s participation in the WMT17 translation quality estimation shared task. In Proceedings of the Second Conference on Machine Translation, pp. 569–574.
Mikolov, T., Chen, K., Corrado, G. and Dean, J. (2013). Efficient estimation of word representations in vector space. In CoRR, abs/1301.3781.
Oostdijk, N., Reynaert, M., Monachesi, P., Noord, G.V., Ordelman, R. and Schuurman, I. (2008). From DCoi to SoNaR: A reference corpus for dutch. In Proceedings of the Sixth International Conference on Language Resources and Evaluation.
Owczarzak, K., van Genabith, J. and Way, A. (2007). Labelled dependencies in machine translation evaluation. In Proceedings of the Second Workshop on Statistical Machine Translation. Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 104111. Retrieved from http://dl.acm.org/citation.cfm?id=1626355.1626369
Patel, R.N. and Sasikumar, M. (2016). Translation quality estimation using recurrent neural network. In CoRR, abs/1610.04841.
Řehůřek, R. and Sojka, P. (2010). Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Valletta, Malta: ELRA, pp. 4550. Retrieved from http://is.muni.cz/publication/884893/en
Scarton, C., Beck, D., Shah, K., Smith, K.S. and Specia, L. (2016). Word embeddings and discourse information for machine translation quality estimation. In Proceedings of the First Conference onMachine Translation. Berlin, Germany: Association for Computational Linguistics, pp. 831837.
Snover, M., Dorr, B., Schwartz, R., Micciulla, L. and Makhoul, J. (2006). A study of translation edit rate with targeted human annotation. In Proceedings of Association for Machine Translation in the Americas, pp. 223231.
Socher, R., Lin, C.C., Ng, A.Y. and Manning, C.D. (2011a). Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the 26th International Conference on Machine Learning (ICML).
Socher, R., Pennington, J., Huang, E.H., Ng, A.Y. and Manning, C.D. (2011b). Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, pp. 151161. Retrieved from http://dl.acm.org/citation.cfm?id=2145432.2145450://dl.acm.org/citation.cfm?id=2145432.2145450
Specia, L., Turchi, M., Cancedda, N., Dymetman, M. and Cristianini, N. (2009). Estimating the sentence-level quality of machine translation systems. In 13th Annual Conference of the European Association for Machine Translation. Barcelona, Spain, pp. 2837. Retrieved from http://www.mt-archive.info/EAMT-2009-Specia.pdf
Specia, L., Shah, K., De Souza, J.G.C., Cohn, T. and Kessler, F.B. (2013). QuEst - A translation quality estimation framework. In Proceedings of the 51th Conference of the Association for Computational Linguistics (ACL), Demo Session.
Specia, L., Logacheva, V. and Scarton, C. (2016). WMT16 quality estimation shared task training and development data. Retrieved from http://hdl.handle.net/11372/LRT-1646 (LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics, Charles University)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. and Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15(1), 19291958.
Stymne, S. and Ahrenberg, L. (2010). Using a grammar checker for evaluation and postprocessing of statistical machine translation. In Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC’10). European Language Resources Association (ELRA).
Tezcan, A., Hoste, V. and Macken, L. (2016). Detecting grammatical errors in machine translation output using dependency parsing and treebank querying. Baltic Journal of Modern Computing 4(2), 203217.
Tezcan, A., Hoste, V. and Macken, L. (2017a). A neural network architecture for detecting grammatical errors in statistical machine translation. The Prague Bulletin of Mathematical Linguistics 108, 133145.
Tezcan, A., Hoste, V. and Macken, L. (2017b). Scate taxonomy and corpus of machine translation errors. In Pastor, G.C. and Durán-Mu˜ñoz, I. (eds), Trends in e-Tools and Resources for Translators and Interpreters, vol. 45. Leiden, The Netherlands: Brill Rodopi, pp. 219244.
Tieleman, T. and Hinton, G. (2012). Lecture 6.5–RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning.
Toury, G. (2000). The nature and role of norms in translation. The Translation Studies Reader 2, 198212.
Turian, J., Ratinov, L. and Bengio, Y. (2010). Word representations: A simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 384–394.
Ueffing, N. and Ney, H. (2005). Application of word-level confidence measures in interactive statistical machine translation. In Proceedings of EAMT 2005 10th Annual Conference of the European Association for Machine Translation, pp. 262–270.
Van Noord, G. (2006). At last parsing is now operational. In TALN06. Verbum ex machina. Actes de la 13e conference sur le traitement automatique des langues naturelles, pp. 20–42.
Vilar, D., Xu, J., D’haro, L.F. and Ney, H. (2006). Error analysis of statistical machine translation output. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC-2006). Genoa, Italy: European Language Resources Association (ELRA). (ACL Anthology Identifier: L06–1244)
White, J.S. (1995). Approaches to black box MT evaluation. In Proceedings of Machine Translation Summit V, vol. 10.
Wolk, K. and Marasek, K. (2015). Building subject-aligned comparable corpora and mining it for truly parallel sentence pairs. In CoRR, abs/1509.08881. Retrieved from http://arxiv.org/abs/1509.08881
Xu, J., Deng, Y., Gao, Y. and Ney, H. (2007). Domain dependent statistical machine translation. In Proceedings of the MT Summit XI, pp. 515–520.

Keywords

Estimating word-level quality of statistical machine translation output using monolingual information alone

  • Arda Tezcan (a1), Véronique Hoste (a1) and Lieve Macken (a1)

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed