Sentence embeddings in NLI with iterative refinement encoders

Aarne Talman; Anssi Yli-Jyrä; Jörg Tiedemann

doi:10.1017/S1351324919000202

Sentence embeddings in NLI with iterative refinement encoders

Published online by Cambridge University Press: 31 July 2019

Aarne Talman

Anssi Yli-Jyrä and

Jörg Tiedemann

Show author details

Aarne Talman*: Affiliation:
Department of Digital Humanities, University of Helsinki, Finland
Anssi Yli-Jyrä: Affiliation:
Department of Digital Humanities, University of Helsinki, Finland
Jörg Tiedemann: Affiliation:
Department of Digital Humanities, University of Helsinki, Finland
*: *Corresponding author. Email: aarne.talman@helsinki.fi

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Sentence-level representations are necessary for various natural language processing tasks. Recurrent neural networks have proven to be very effective in learning distributed representations and can be trained efficiently on natural language inference tasks. We build on top of one such model and propose a hierarchy of bidirectional LSTM and max pooling layers that implements an iterative refinement strategy and yields state of the art results on the SciTail dataset as well as strong results for Stanford Natural Language Inference and Multi-Genre Natural Language Inference. We can show that the sentence embeddings learned in this way can be utilized in a wide variety of transfer learning tasks, outperforming InferSent on 7 out of 10 and SkipThought on 8 out of 9 SentEval sentence embedding evaluation tasks. Furthermore, our model beats the InferSent model in 8 out of 10 recently published SentEval probing tasks designed to evaluate sentence embeddings’ ability to capture some of the important linguistic properties of sentences.

Keywords

natural language inference sentence representations representation learning

Information

Type: Article
Information: Natural Language Engineering , Volume 25 , Issue 4 , July 2019 , pp. 467 - 482

DOI: https://doi.org/10.1017/S1351324919000202 [Opens in a new window]
Copyright: © Cambridge University Press 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Balazs, J., Marrese-Taylor, E., Loyola, P. and Matsuo, Y. (2017). Refining raw sentence representations for textual entailment recognition via attention. In Workshop on Evaluating Vector Space Representations for NLP. ACL. pp. 51–55.CrossRef Google Scholar

Bowman, S.R., Angeli, G., Potts, C. and Manning, C.D. (2015). A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing EMNLP. Association for Computational Linguistics. pp. 632–642.CrossRef Google Scholar

Bowman, S.R., Gauthier, J., Rastogi, A., Gupta, R., Manning, C.D. and Potts, C. (2016). A fast unified model for parsing and sentence understanding. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics. pp. 1466–1477CrossRef Google Scholar

Chatzikyriakidis, S., Cooper, R., Dobnik, S. and Larsson, S. (2017). An overview of natural language inference data collection: The way forward? In Proceedings of the Computing Natural Language Inference Workshop.Google Scholar

Chen, Q., Ling, Z.-H. and Zhu, X. (2018). Enhancing sentence embedding with generalized pooling. In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics. pp. 1815–1826.Google Scholar

Chen, Q., Zhu, X., Ling, Z.-H., Wei, S., Jiang, H. and Inkpen, D. (2017a). Enhanced lstm for natural language inference. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics. pp. 1657–1668.CrossRef Google Scholar

Chen, Q., Zhu, X., Ling, Z.-H., Wei, S., Jiang, H. and Inkpen, D. (2017b). Recurrent neural network-based sentence encoder with gated attention for natural language inference. In Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP. Association for Computational Linguistics. pp. 36–40. ACL.CrossRef Google Scholar

Conneau, A. and Kiela, D. (2018). SentEval: An evaluation toolkit for universal sentence representations. In Proceedings of the 11th Language Resources and Evaluation Conference. European Language Resource Association. Miyazaki, Japan: Phoenix Seagaia Conference Center, pp. 1699–1704.Google Scholar

Conneau, A., Kiela, D., Schwenk, H., Barrault, L. and Bordes, A. (2017). Supervised learning of universal sentence representations from natural language inference data. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. pp. 670–680.CrossRef Google Scholar

Conneau, A., Kruszewski, G., Lample, G., Barrault, L. and Baroni, M. (2018). What you can cram into a single vector: Probing sentence embeddings for linguistic properties. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics. pp. 2126–2136.CrossRef Google Scholar

Glockner, M., Shwartz, V. and Goldberg, Y. (2018). Breaking nli systems with sentences that require simple lexical inferences. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics. pp. 650–655.CrossRef Google Scholar

Gururangan, S., Swayamdipta, S., Levy, O., Schwartz, R., Bowman, S. and Smith, N.A. (2018). Annotation artifacts in natural language inference data. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics. pp. 107–112.Google Scholar

Hill, F., Cho, K. and Korhonen, A. (2016). Learning distributed representations of sentences from unlabelled data. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics. pp. 11367–1377.Google Scholar

Khot, T., Sabharwal, A. and Clark, P. (2018). Scitail: A textual entailment dataset from science question answering. In AAAI Conference on Artificial Intelligence.Google Scholar

Kingma, D.P. and Ba, J. (2015). Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR).Google Scholar

Kiros, R., Zhu, Y., Salakhutdinov, R., Zemel, R.S., Urtasun, R., Torralba, A. and Fidler, S. (2015). Skip-thought vectors. In Advances in Neural Information Processing Systems 28. Curran Associates, Inc. pp. 3294–3302.Google Scholar

Maas, A.L., Hannun, A.Y. and Ng, A.Y. (2013). Rectifier nonlinearities improve neural network acoustic models. In International Conference on Machine Learning.Google Scholar

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26, USA: Curran Associates, Inc. pp. 3111–3119.Google Scholar

Mou, L., Men, R., Li, G., Xu, Y., Zhang, L., Yan, R. and Jin, Z. (2016). Natural language inference by tree-based convolution and heuristic matching. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics. pp. 130–136.CrossRef Google Scholar

Nie, Y. and Bansal, M. (2017). Shortcut-stacked sentence encoders for multi-domain inference. In Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP. Association for Computational Linguistics. pp. 41–45.CrossRef Google Scholar

Parikh, A.P., Täckström, O., Das, D. and Uszkoreit, J. (2016). A decomposable attention model for natural language inference. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. pp. 2249–2255.CrossRef Google Scholar

Pennington, J., Socher, R. and Manning, C.D. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics. pp. 1532–1543.CrossRef Google Scholar

Poliak, A., Naradowsky, J., Haldar, A., Rudinger, R. and Van Durme, B. (2018). Hypothesis only baselines in natural language inference. In Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics. Association for Computational Linguistics. pp. 180–191.CrossRef Google Scholar

Talman, A. and Chatzikyriakidis, S. (2019). Testing the generalization power of neural network models across NLI benchmarks. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. arXiv:1810.09774.Google Scholar

Tay, Y., Tuan, L.A. and Hui, S.C. (2018). Compare, compress and propagate: Enhancing neural architectures with alignment factorization for natural language inference. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. pp. 1565–1575.CrossRef Google Scholar

Vendrov, I., Kiros, R., Fidler, S. and Urtasun, R. (2016). Order-embeddings of images and language. In 6th International Conference on Learning Representations.Google Scholar

Vu, H. (2017). Lct-malta’s submission to repeval 2017 shared task. In Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP. Association for Computational Linguistics. pp. 56–60.CrossRef Google Scholar

Williams, A., Nangia, N. and Bowman, S.R. (2018). A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics. pp. 1112–1122.Google Scholar

Yoon, D., Lee, D. and Lee, S. (2018). Dynamic Self-Attention : Computing Attention over Words Dynamically for Sentence Embedding. arXiv:1808.07383.Google Scholar

Young, P., Lai, A., Hodosh, M. and Hockenmaier, J. (2014). From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. In Transactions of the Association for Computational Linguistics (TACL) 2, pp. 67–78.CrossRef Google Scholar

Article contents

Sentence embeddings in NLI with iterative refinement encoders

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests