References
Abdou, M., Kulmizev, A., Hershcovich, D., et al. (2021). Can language models encode perceptual structure without grounding? A case study in color. arXiv:2109.06129.
Abusch, D. (2020). Possible-worlds semantics for pictures. In Gutzmann, D., Matthewson, L., Meier, C., et al., eds., The Wiley Blackwell companion to semantics (pp. 1–31). Wiley Blackwell.
Abzianidze, L. (2016). Natural solution to fracas entailment problems. In Proceedings of *SEM (pp. 64–74). ACL.
Abzianidze, L., Bjerva, J., Evang, K., et al. (2017). The Parallel Meaning Bank: Towards a multilingual corpus of translations annotated with compositional meaning representations. In Proceedings of EACL (pp. 242–247). ACL.
Abzianidze, L., Zwarts, J., & Winter, Y. (2023). SpaceNLI: Evaluating the consistency of predicting inferences in space. In Proceedings of NALOMA (pp. 12–24). ACL.
Agirre, E., Cer, D., Diab, M., & Gonzalez-Agirre, A. (2012). Semeval-2012 task 6: A pilot on semantic textual similarity. In Proceedings of Semeval (pp. 385–393).
Akyürek, E., & Andreas, J. (2022). Compositionality as lexical symmetry. arXiv:2201.12926.
Andreas, J. (2019a). Good-enough compositional data augmentation. arXiv:1904.09545.
Antol, S., Agrawal, A., Lu, J., et al. (2015). Vqa: Visual question answering. In Proceedings of IEEE/CVFICCV (pp. 2425–2433).
Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv:1409.0473.
Banarescu, L., Bonial, C., Cai, S., et al. (2013). Abstract meaning representation for sembanking. In Proceedings of the 7th linguistic annotation workshop and interoperability with discourse (pp. 178–186). ACL.
Baroni, M. (2016). Grounding distributional semantics in the visual world. Language and Linguistics Compass, 10(1), 3–13.
Baroni, M. (2022). On the proper role of linguistically-oriented deep net analysis in linguistic theorizing. In Lappin, S., ed., Algebraic systems and the representation of linguistic knowledge (pp. 5–22). Taylor and Francis.
Baroni, M., Dinu, G., & Kruszewski, G. (2014). Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of ACL (pp. 238–247). ACL.
Baroni, M., & Zamparelli, R. (2010). Nouns are vectors, adjectives are matrices: Representing adjective–noun constructions in semantic space. In Proceedings of EMNLP (pp. 1183–1193).
Barsalou, L. W. (2008). Grounded cognition. Annual Review of Psychology, 59(1), 617–645.
Bender, E. M., & Koller, A. (2020). Climbing towards NLU: On meaning, form, and understanding in the age of data. In Proceedings of ACL (pp. 5185–5198). ACL.
Bernardi, R., Dinu, G., Marelli, M., & Baroni, M. (2013). A relatedness benchmark to test the role of determiners in compositional distributional semantics. In Proceedings of ACL (pp. 53–57). ACL.
Bernardi, R., & Pezzelle, S. (2021). Linguistic issues behind visual question answering. Language and Linguistics Compass, 15(6), e12417.
Bernardy, J.-P., & Chatzikyriakidis, S. (2021). Applied temporal analysis: A complete run of the FraCaS test suite. In Proceedings of IWCS (pp. 11–20). ACL.
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 135–146.
Boleda, G., Baroni, M., McNally, L., et al. (2013). Intensionality was only alleged: On adjective-noun composition in distributional semantics. In Proceedings of IWCS.
Botha, J., & Blunsom, P. (2014). Compositional morphology for word representations and language modelling. In Xing, E. P. & Jebara, T., eds., Proceedings of the 31st international conference on machine learning (Vol. 32, no. 2) (pp. 1899–1907). https://proceedings.mlr.press/v32/botha14.html. Bowman, S. R., Angeli, G., Potts, C., & Manning, C. D. (2015). A large annotated corpus for learning natural language inference. In Proceedings of EMNLP (pp. 632–642).
Bowman, S. R., & Dahl, G. (2021). What will it take to fix benchmarking in natural language understanding? In Proceedings of NAACL. ACL.
Brown, T., Mann, B., Ryder, N., et al. (2020). Language models are few-shot learners. In Larochelle, H., Ranzato, M., Hadsell, R., et al., eds., Advances in neural information processing systems (Vol. 33, pp. 1877–1901). Curran Associates, Inc. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf. Burgess, C., & Lund, K. (1995). Hyperspace analogue to language (HAL): A general model of semantic memory. In Annual meeting of the psychonomic society.
Bylinina, L., & Tikhonov, A. (2022). The driving forces of polarity-sensitivity: Experiments with multilingual pre-trained neural language models. In Proceedings of COGSCI (Vol. 44).
Cer, D., Yang, Y., Kong, S.-Y., et al. (2018). Universal sentence encoder. arXiv:1803.11175.
Chaabouni, R., Dessì, R., & Kharitonov, E. (2021). Can Transformers jump around right in natural language? Assessing performance transfer from scan. In Proceedings of blackboxnlp (pp. 136–148).
Chaabouni, R., Kharitonov, E., Bouchacourt, D., et al. (2020). Compositionality and generalization in emergent languages. arXiv:2004.09124.
Chan, S. C., Santoro, A., Lampinen, A. K., et al. (2022). Data distributional properties drive emergent few-shot learning in transformers. arXiv:2205.05055.
Chatzikyriakidis, S., Cooper, R., Dobnik, S., & Larsson, S. (2017). An overview of natural language inference data collection: The way forward? In Proceedings of the computing natural language inference workshop.
Chen, T., Jiang, Z., Poliak, A., et al. (2020). Uncertain natural language inference. In Proceedings of ACL. ACL.
Chen, Z. (2021). Attentive tree-structured network for monotonicity reasoning. In Proceedings of NALOMA (pp. 12–21). ACL.
Chen, Z., Gao, Q., & Moss, L. S. (2021). NeuralLog: Natural language inference with joint neural and logical reasoning. In Proceedings of *SEM. ACL.
Cho, K., Van Merriënboer, B., Gulcehre, C., et al. (2014). Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv:1406.1078.
Chowdhery, A., Narang, S., Devlin, J., et al. (2022). Palm: Scaling language modeling with pathways. arXiv:2204.02311.
Clark, H. H. (1996). Using language. Cambridge University Press.
Clark, P., Tafjord, O., & Richardson, K. (2021). Transformers as soft reasoners over language. In Proceedings of IJCAI.
Clark, S., Coecke, B., & Sadrzadeh, M. (2008). A compositional distributional model of meaning. In Proceedings of the second quantum interaction symposium (qi-2008) (pp. 133–140).
Condoravdi, C., Crouch, D., de Paiva, V., et al. (2003). Entailment, intensionality and text understanding. In Proceedings of the HLT-NAACL 2003 workshop on text meaning (pp. 38–45).
Cooper, R., Crouch, D., Eijck, J. V., et al. (1996). Fracas: A framework for computational semantics. Deliverable D16.
Coppock, E., & Champollion, L. (2022). Invitation to formal semantics. Manuscript, Boston University and New York University.
Dagan, I., Glickman, O., & Magnini, B. (2006). The Pascal recognising textual entailment challenge. In Proceedings of the Pascal challenges workshop on recognising textual entailment (pp. 177–190). Springer.
Dagan, I., Roth, D., Sammons, M., & Zanzotto, F. M. (2013). Recognizing textual entailment: Models and applications. Morgan & Claypool.
Dalvi, B., Jansen, P., Tafjord, O., et al. (2021). Explaining answers with entailment trees. In Proceedings of EMNLP (pp. 7358–7370). ACL.
Dankers, V., Bruni, E., & Hupkes, D. (2022). The paradox of the compositionality of natural language: A neural machine translation case study. In Proceedings of ACL (pp. 4154–4175). ACL.
Davis, F. (2022). On the limitations of data: Mismatches between neural models of language and humans (Unpublished doctoral dissertation). Cornell University.
de Marneffe, M.-C., Rafferty, A. N., & Manning, C. D. (2008). Finding contradictions in text. In Proceedings of ACL (pp. 1039–1047). ACL.
de Marneffe, M.-C., Simons, M., & Tonhauser, J. (2019). The commitmentbank: Investigating projection in naturally occurring discourse. Proceedings of Sinn und Bedeutung, 23(2), 107–124.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Burstein, J., Doran, C., & Solorio, T., eds., Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: Human language technologies, volume 1 (long and short papers) (pp. 4171–4186). Association for Computational Linguistics. https://aclanthology.org/N19-1423. https://doi.org/10.18653/v1/N19-1423. Dima, C., de Kok, D., Witte, N., & Hinrichs, E. (2019). No word is an island: A transformation weighting model for semantic composition. Transactions of the Association for Computational Linguistics, 7, 437–451.
Drozdov, A., Schärli, N., Akyuürek, E., et al. (2022). Compositional semantic parsing with large language models. arXiv:2209.15003.
Du, L., Ding, X., Xiong, K., Liu, T., & Qin, B. (2022). Enhancing pretrained language models with structured commonsense knowledge for textual inference. Knowledge-Based Systems, 109488.
Du, Y., Li, S., & Mordatch, I. (2020). Compositional visual generation with energy based models. In Neurips (Vol. 33, pp. 6637–6647). Curran Associates, Inc.
Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14(2), 179–211.
Ettinger, A. (2020). What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models. Transactions of the Association for Computational Linguistics, 8, 34–48.
Ettinger, A., Elgohary, A., Phillips, C., & Resnik, P. (2018). Assessing composition in sentence vector representations. In Proceedings of Coling (pp. 1790–1801). ACL.
Fitch, F. B. (1973). Natural deduction rules for English. Philosophical Studies, 24(2), 89–104.
Frank, S., Bugliarello, E., & Elliott, D. (2021). Vision-and-language or vision-for-language? On cross-modal influence in multimodal transformers. In Proceedings of EMNLP (pp. 9847–9857). Association for Computational Linguistics.
Gal, R., Alaluf, Y., Atzmon, Y., Patashnik, O., Bermano, A. H., Chechik, G., & Cohen-Or, D. (2022). An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv.
Gamallo, P. (2021). Compositional distributional semantics with syntactic dependencies and selectional preferences. Applied Sciences, 11(12), 1–13.
Gardenfors, P. (2004). Conceptual spaces as a framework for knowledge representation. Mind and Matter, 2(2), 9–27.
Geiger, A., Cases, I., Karttunen, L., & Potts, C. (2018). Stress-testing neural models of natural language inference with multiply-quantified sentences. arXiv.
Geiger, A., Richardson, K., & Potts, C. (2020). Neural natural language inference models partially embed theories of lexical entailment and negation. In Proceedings of blackboxnlp (pp. 163–173).
Giampiccolo, D., Magnini, B., Dagan, I., & Dolan, B. (2007). The third PASCAL recognizing textual entailment challenge. In Proceedings of the ACL-PASCAL workshop on textual entailment and paraphrasing. ACL.
Gleitman, L. (1990). The structural sources of verb meanings. Language Acquisition, 1(1), 3–55.
Gleitman, L. R., Cassidy, K., Nappa, R., Papafragou, A., & Trueswell, J. C. (2005). Hard words. Language Learning and Development, 1(1), 23–64.
Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., & Parikh, D. (2017). Making the v in vqa matter: Elevating the role of image understanding in visual question answering. In Proceedings of IEEE/CVF CVPR (pp. 6904–6913).
Greenberg, G. (2013). Beyond resemblance. Philosophical Review, 122(2).
Greenberg, G. (2021). Semantics of pictorial space. Review of Philosophy and Psychology, 12(4), 847–887.
Grefenstette, E., Dinu, G., Zhang, Y.- Z., et al. (2013). Multi-step regression learning for compositional distributional semantics. arXiv:1301.6939.
Guevara, E. R. (2011). Computing semantic compositionality in distributional semantics. In Proceedings of the ninth international conference on computational semantics (pp. 135–144).
Gururangan, S., Swayamdipta, S., Levy, O., et al. (2018). Annotation artifacts in natural language inference data. In Proceedings of NAACL (pp. 107–112). ACL.
Guu, K., Lee, K., Tung, Z., et al. (2020). Realm: Retrieval-augmented language model pre-training. arXiv.
Hacquard, V., & Lidz, J. (2022). On the acquisition of attitude verbs. Annual Review of Linguistics, 8, 193–212.
Harnad, S. (1990). The symbol grounding problem. Physica D: Nonlinear Phenomena, 42(1–3), 335–346.
Harris, R. A. (1993). The linguistics wars. Oxford University Press on Demand.
Hartmann, M., de Lhoneux, M., Hershcovich, D., et al. (2021). A multilingual benchmark for probing negation-awareness with minimal pairs. In Proceedings of CONLL (pp. 244–257). ACL.
Hawthorne, C., Jaegle, A., Cangea, C., et al. (2022). General-purpose, long-context autoregressive modeling with perceiver ar. arXiv:2202.07765.
He, Q., Wang, H., & Zhang, Y. (2020). Enhancing generalization in natural language inference by syntax. In Findings of EMNLP. ACL.
Hessel, J., & Lee, L. (2020). Does my multimodal model learn cross-modal interactions? It’s harder to tell than you might think! In Proceedings of EMNLP (pp. 861–877).
Hill, F., Cho, K., & Korhonen, A. (2016). Learning distributed representations of sentences from unlabelled data. In Proceedings of NAACL (pp. 1367–1377).
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Hofmann, V., Pierrehumbert, J. B., & Schütze, H. (2021). Superbizarre is not superb: Derivational morphology improves BERT’s interpretation of complex words. arXiv:2101.00403.
Hong, R., Liu, D., Mo, X., et al. (2019). Learning to compose and reason with language tree structures for visual grounding. IEEE transactions on pattern analysis and machine intelligence.
Hossain, M. M., Kovatchev, V., Dutta, P., et al. (2020). An analysis of natural language inference benchmarks through the lens of negation. In Proceedings of EMNLP (pp. 9106–9118). ACL.
Hu, H., Chen, Q., & Moss, L. (2019). Natural language inference with monotonicity. In Proceedings of IWCS (pp. 8–15). ACL.
Hudson, D. A., & Manning, C. D. (2018). Compositional attention networks for machine reasoning. In International Conference on Learning Representations.
Hudson, D. A., & Manning, C. D. (2019). Gqa: A new dataset for real-world visual reasoning and compositional question answering. In Proceedings of IEEE/CVF CVPR (pp. 6700–6709).
Hupkes, D., Dankers, V., Mul, M., & Bruni, E. (2020). Compositionality decomposed: How do neural networks generalise? Journal of Artificial Intelligence Research, 67, 757–795.
Hupkes, D., Giulianelli, M., Dankers, V., et al. (2022). State-of-the-art generalisation research in NLP: A taxonomy and review.
Hupkes, D., Veldhoen, S., & Zuidema, W. (2018). Visualisation and “diagnostic classifiers” reveal how recurrent and recursive neural networks process hierarchical structure. Journal of Artificial Intelligence Research, 61(1), 907–926.
Icard, T. F. (2012). Inclusion and exclusion in natural language. Studia Logica, 100(4), 705–725.
Icard, T. F., & Moss, L. S. (2014). Recent progress on monotonicity. LILT, 9.
Irsoy, O., & Cardie, C. (2014). Deep recursive neural networks for compositionality in language. NeurIPS, 27, 1–9.
Jeretic, P., Warstadt, A., Bhooshan, S., & Williams, A. (2020). Are natural language inference models IMPPRESsive? Learning IMPlicature and PRESupposition. In Proceedings of ACL (pp. 8690–8705). ACL.
Jiang, N., & de Marneffe, M.-C. (2019). Evaluating BERT for natural language inference: A case study on the CommitmentBank. In Proceedings of EMNLP–IJCNLP (pp. 6086–6091). ACL.
Jinman, Z., Zhong, S., Zhang, X., & Liang, Y. (2020). Pbos: Probabilistic bag-of-subwords for generalizing word embedding. arXiv:2010.10813.
Johnson, J., Hariharan, B., Van der Maaten, L., et al. (2017). Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. In Proceedings of IEEE/CVF CVPR (pp. 2901–2910).
Jumelet, J., Denic, M., Szymanik, J., et al. (2021). Language models use monotonicity to assess NPI licensing. In Findings of ACL–IJCNLP (pp. 4958–4969). ACL.
Kalouli, A.-L., Hu, H., Webb, A. F., et al. (2023). Curing the SICK and other NLI maladies. Computational Linguistics, 49(1), 199–243.
Kalouli, A.-L., Real, L., & de Paiva, V. (2017). Textual inference: Getting logic from humans. In Proceedings of IWCS.
Kartsaklis, D., Sadrzadeh, M., & Pulman, S. (2013). Separating disambiguation from composition in distributional semantics. In Proceedings of CONLL (pp. 114–123). ACL.
Kassner, N., & Schütze, H. (2020). Negated and misprimed probes for pretrained language models: Birds can talk, but cannot fly. In Proceedings of ACL (pp. 7811–7818). ACL.
Khan, S., Naseer, M., Hayat, M., et al. (2021). Transformers in vision: A survey. ACM computing surveys (CSUR).
Kiela, D., Bulat, L., & Clark, S. (2015). Grounding semantics in olfactory perception. In Proceedings of ACL (pp. 231–236).
Kim, N., & Linzen, T. (2020). Cogs: A compositional generalization challenge based on semantic interpretation. In Proceedings of EMNLP.
Kim, N., & Schuster, S. (2023). Entity tracking in language models. In Proceedings of ACL (pp. 3835–3855). ACL.
Kirby, S., Cornish, H., & Smith, K. (2008). Cumulative cultural evolution in the laboratory: An experimental approach to the origins of structure in human language. Proceedings of the National Academy of Sciences, 105(31), 10681–10686.
Kirby, S., Tamariz, M., Cornish, H., & Smith, K. (2015). Compression and communication in the cultural evolution of linguistic structure. Cognition, 141, 87–102.
Kober, T., Bijl de Vroe, S., & Steedman, M. (2019). Temporal and aspectual entailment. In Proceedings of IWCS (pp. 103–119). ACL.
Kracht, M. (2011). Interpreted languages and compositionality (Vol. 89). Springer Science & Business Media.
Kratzer, A., & Heim, I. (1998). Semantics in generative grammar (Vol. 1185). Blackwell Oxford.
Krishna, R., Zhu, Y., Groth, O., et al. (2017). Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision, 123(1), 32–73.
Kudo, T., & Richardson, J. (2018). Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing.
Lai, A., & Hockenmaier, J. (2014). Illinois-LH: A denotational and distributional approach to semantics. In Proceedings of SemEval (pp. 329–334). ACL.
Lake, B., & Baroni, M. (2018). Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. In ICML.
Lakoff, G. (1970). Linguistics and natural logic. Synthese, 22(1), 151–271.
Landau, B., & Gleitman, L. R. (1985). Language and experience: Evidence from the blind child. Harvard University Press.
Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211–240.
Lazaridou, A., Marelli, M., Zamparelli, R., & Baroni, M. (2013). Compositional-ly derived representations of morphologically complex words in distributional semantics. In Proceedings of ACL (pp. 1517–1526).
Le, P., & Zuidema, W. (2015). Compositional distributional semantics with long short term memory. arXiv:1503.02510.
Levy, O., & Goldberg, Y. (2014). Neural word embedding as implicit matrix factorization. NeurIPS, 27, 1–9.
Lewis, D. (1970). General semantics. Synthese, 22(1/2), 18–67.
Li, B. Z., Nye, M., & Andreas, J. (2021). Implicit representations of meaning in neural language models. arXiv:2106.00737.
Li, F., Zhang, H., Zhang, Y.-F., et al. (2022). Vision-language intelligence: Tasks, representation learning, and large models. arXiv:2203.01922.
Li, L. H., Yatskar, M., Yin, D., et al. (2019). Visualbert: A simple and performant baseline for vision and language. arXiv:1908.03557.
Lin, T.-Y., Maire, M., Belongie, S., et al. (2014). Microsoft Coco: Common objects in context. In European conference on computer vision (pp. 740–755).
Lin, Z., Feng, M., dos Santos, C. N., et al. (2017). A structured self-attentive sentence embedding. In ICLR.
Linzen, T., & Baroni, M. (2021). Syntactic structure from deep learning. Annual Review of Linguistics, 7, 195–212.
Liu, A., Wu, Z., Michael, J., et al. (2023). We’re afraid language models aren’t modeling ambiguity. In Bouamor, H., Pino, J., & Bali, K., eds., Proceedings of the 2023 conference on empirical methods in natural language processing (pp. 790–807). Association for Computational Linguistics. https://aclanthology.org/2023.emnlp-main.51. https://doi.org/10.18653/v1/2023.emnlp-main.51. Liu, Y., Ott, M., Goyal, N., et al. (2019). Roberta: A robustly optimized BERT pretraining approach. arXiv:1907.11692.
Lu, J., Batra, D., Parikh, D., & Lee, S. (2019). Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. NeurIPS.
Lu, J., Goswami, V., Rohrbach, M., et al. (2020). 12-in-1: Multi-task vision and language representation learning. In Proceedings of IEEE/CVF CVPR (pp. 10437–10446).
Luong, M.- T., Socher, R., & Manning, C. D. (2013). Better word representations with recursive neural networks for morphology. In Proceedings of CONLL.
MacCartney, B., & Manning, C. D. (2007). Natural logic for textual inference. In Proceedings of the ACL-PASCAL workshop on textual entailment and paraphrasing (pp. 193–200). ACL.
MacCartney, B., & Manning, C. D. (2009). An extended model of natural logic. In Proceedings of IWCS (pp. 140–156). ACL.
Mao, J., Huang, J., Toshev, A., et al. (2016). Generation and comprehension of unambiguous object descriptions. In Proceedings of IEEE/CVF CVPR (pp. 11–20).
Marelli, M., Menini, S., Baroni, M., et al. (2014). A sick cure for the evaluation of compositional distributional semantic models. In Proceedings of LREC (pp. 216–223).
Margolis, E. E., & Laurence, S. E. (1999). Concepts: Core readings. MIT Press.
McCoy, R. T., Linzen, T., Dunbar, E., & Smolensky, P. (2019). RNNs implicitly implement tensor-product representations. In ICLR.
McCoy, R. T., Pavlick, E., & Linzen, T. (2019). Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. In Proceedings of ACL (pp. 3428–3448). ACL.
Merrill, W., Warstadt, A., & Linzen, T. (2022). Entailment semantics can be extracted from an ideal language model. arXiv.
Merullo, J., Eickhoff, C., & Pavlick, E. (2023). A mechanism for solving relational tasks in transformer language models.
Meteyard, L., Cuadrado, S. R., Bahrami, B., & Vigliocco, G. (2012). Coming of age: A review of embodiment and the neuroscience of semantics. Cortex, 48(7), 788–804.
Mickus, T., Bernard, T., & Paperno, D. (2020). What meaning–form correlation has to compose with: A study of MFC on artificial and natural language. In Proceedings of COLING (pp. 3737–3749). International Committee on Computational Linguistics.
Mickus, T., Paperno, D., & Constant, M. (2022). How to dissect a Muppet: The structure of transformer embedding spaces. TACL, 10, 981–996.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv:1301.3781.
Mitchell, J., & Lapata, M. (2010). Composition in distributional models of semantics. Cognitive Science, 34(8), 1388–1429.
Montague, R. (1970). English as a formal language. In Linguaggi nella societa e nella tecnica (pp. 188–221). Edizioni di Communita.
Montague, R. (1973). The proper treatment of quantification in ordinary English. In Approaches to natural language (pp. 221–242). Springer.
Moss, L. S. (2010). Natural logic and semantics. In Logic, language and meaning (pp. 84–93). Springer.
Moss, L. S. (2015). Natural logic. The handbook of contemporary semantic theory (pp. 559–592).
Murzi, J., & Steinberger, F. (2017). Inferentialism. A Companion to the Philosophy of Language, 1, 197–224.
Naik, A., Ravichander, A., Sadeh, N., et al. (2018). Stress test evaluation for natural language inference. In Proceedings of COLING (pp. 2340–2353). ACL.
Nangia, N., & Bowman, S. (2018). ListOps: A diagnostic dataset for latent tree learning. In Proceedings of NAACL: Student research workshop. ACL.
Nie, Y., Zhou, X., & Bansal, M. (2020). What can we learn from collective human opinions on natural language inference data? In Proceedings of EMNLP (pp. 9131–9143). ACL.
Nivre, J., de Marneffe, M.-C., Ginter, F., et al. (2020). Universal Dependencies v2: An evergrowing multilingual treebank collection. In Proceedings of LREC. ELRA.
Nye, M., Solar-Lezama, A., Tenenbaum, J., & Lake, B. M. (2020). Learning compositional rules via neural program synthesis. NeurIPS, 33, 10832–10842.
Olsson, C., Elhage, N., Nanda, N., et al. (2022). In-context learning and induction heads. arXiv:2209.11895.
Ouyang, L., Wu, J., Jiang, X., et al. (2022). Training language models to follow instructions with human feedback. NeurIPS, 35, 27730–27744.
Paperno, D. (2022). On learning interpreted languages wit–h recurrent models. Computational Linguistics, 48(2), 471–482.
Paperno, D., & Baroni, M. (2016). When the whole is less than the sum of its parts: How composition affects pmi values in distributional semantic vectors. Computational Linguistics, 42(2), 345–350.
Paperno, D., Kruszewski, G., Lazaridou, A., et al. (2016). The LAMBADA dataset: Word prediction requiring a broad discourse context. In Proceedings of ACL.
Paperno, D., Pham, N. T., & Baroni, M. (2014). A practical and linguistically-motivated approach to compositional distributional semantics. In Proceedings of ACL (pp. 90–99). ACL.
Parcalabescu, L., Cafagna, M., Muradjan, L., et al. (2022). Valse: A task-independent benchmark for vision and language models centered on linguistic phenomena. In Proceedings of ACL.
Parcalabescu, L., & Frank, A. (2022). Mm-shap: A performance-agnostic metric for measuring multimodal contributions in vision and language models & tasks. arXiv:2212.08158.
Parikh, P. (2001). The use of language. CSLI Publications.
Parrish, A., Schuster, S., Warstadt, A., et al. (2021). NOPE: A corpus of naturally-occurring presuppositions in English. In Proceedings of CONLL (pp. 349–366). ACL.
Patel, A., Li, B., Rasooli, M. S., et al. (2022). Bidirectional language models are also few-shot learners. arXiv:2209.14500.
Pavlick, E., & Kwiatkowski, T. (2019). Inherent disagreements in human textual inferences. TACL, 7, 677–694.
Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. In Proceedings of EMNLP (pp. 1532–1543).
Pérez, J., Barceló, P., & Marinkovic, J. (2021). Attention is Turing complete. Journal of Machine Learning Research, 22(1), 3463–3497.
Pezzelle, S. (2023). Dealing with semantic underspecification in multimodal NLP. In Proceedings of ACL (pp. 12098–12112). ACL.
Pezzelle, S., Takmaz, E., & Fernández, R. (2021). Word representation learning in multimodal pre-trained transformers: An intrinsic evaluation. TACL, 9, 1563–1579.
Piantadosi, S. T., & Hill, F. (2022). Meaning without reference in large language models. arXiv.
Pinter, Y., Guthrie, R., & Eisenstein, J. (2017). Mimicking word embeddings using subword rnns. arXiv:1707.06961.
Poliak, A. (2020). A survey on recognizing textual entailment as an NLP evaluation. In Proceedings of the first workshop on evaluation and comparison of NLP systems (pp. 92–109). ACL.
Poliak, A., Haldar, A., Rudinger, R., et al. (2018). Collecting diverse natural language inference problems for sentence representation evaluation. In Proceedings of EMNLP (pp. 67–81). ACL.
Poliak, A., Naradowsky, J., Haldar, A., et al. (2018). Hypothesis only baselines in natural language inference. In Proceedings of *SEM (pp. 180–191). ACL.
Potts, C. (2020). Is it possible for language models to achieve language understanding? (Medium post).
Prokhorov, V., Pilehvar, M. T., Kartsaklis, D., et al. (2019). Unseen word representation by aligning heterogeneous lexical semantic spaces. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, pp. 6900–6907).
Pullum, G. K., & Huddleston, R. (2002). Negation. In The Cambridge grammar of the English language (pp. 785–850). Cambridge University Press.
Radford, A., Kim, J. W., Hallacy, C., et al. (2021). Learning transferable visual models from natural language supervision. In ICML (pp. 8748–8763).
Radford, A., Wu, J., Child, R., et al. (2019). Language models are unsupervised multitask learners.
Rajaee, S., Yaghoobzadeh, Y., & Pilehvar, M. T. (2022). Looking at the overlooked: An analysis on the word-overlap bias in natural language inference. In Proceedings of EMNLP (pp. 10605–10616). ACL.
Rajpurkar, P., Zhang, J., Lopyrev, K., & Liang, P. (2016). Squad: 100,000+ questions for machine comprehension of text. arXiv:1606.05250.
Ramesh, A., Dhariwal, P., Nichol, A., et al. (2022). Hierarchical text-conditional image generation with clip latents. arXiv.
Rassin, R., Ravfogel, S., & Goldberg, Y. (2022). Dalle-2 is seeing double: Flaws in word-to-concept mapping in text2image models. arXiv:2210.10606.
Ravichander, A., Naik, A., Rose, C., & Hovy, E. (2019). EQUATE: A benchmark evaluation framework for quantitative reasoning in natural language inference. In Proceedings of CONLL (pp. 349–361). ACL.
Ribeiro, M. T., Wu, T., Guestrin, C., & Singh, S. (2020). Beyond accuracy: Behavioral testing of NLP models with CheckList. In Proceedings of ACL (pp. 4902–4912). ACL.
Richardson, K., Hu, H., Moss, L. S., & Sabharwal, A. (2020). Probing natural language inference models through semantic fragments. In AAAI.
Ritter, S., Long, C., Paperno, D., et al. (2015). Leveraging preposition ambiguity to assess compositional distributional models of semantics. In Proceedings of *SEM.
Rogers, A., Kovaleva, O., & Rumshisky, A. (2020). A primer in BERTology: What we know about how BERT works. TACL, 8, 842–866.
Rogers, A., & Rumshisky, A. (2020). A guide to the dataset explosion in QA, NLI, and commonsense reasoning. In Proceedings of COLING: Tutorial abstracts (pp. 27–32). International Committee for Computational Linguistics.
Rombach, R., Blattmann, A., Lorenz, D., et al. (2021). High-resolution image synthesis with latent diffusion models.
Ross, A., & Pavlick, E. (2019). How well do NLI models capture verb veridicality? In Proceedings of EMNLP–IJCNLP (pp. 2230–2240). ACL.
Ruiz, N., Li, Y., Jampani, V., et al. (2022). Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. arXiv.
Ryzhova, D., Kyuseva, M., & Paperno, D. (2016). Typology of adjectives benchmark for compositional distributional models. In Proceedings of LREC (pp. 1253–1257).
Saha, S., Ghosh, S., Srivastava, S., & Bansal, M. (2020). PRover: Proof generation for interpretable reasoning over rules. In Proceedings of EMNLP (pp. 122–136). ACL.
Saha, S., Nie, Y., & Bansal, M. (2020). ConjNLI: Natural language inference over conjunctive sentences. In Proceedings of EMNLP. ACL.
Saharia, C., Chan, W., Saxena, S., et al. (2022). Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35, 36479–36494.
Schlag, I., Smolensky, P., Fernandez, R., et al. (2019). Enhancing the transformer with explicit relational encoding for math problem solving. arXiv:1910.06611.
Schlenker, P. (2018). What is super semantics? Philosophical Perspectives, 32(1), 365–453.
Schroeder-Heister, P. (2018). Proof-theoretic semantics. In The Stanford encyclopedia of philosophy (Spring 2018 ed.). Metaphysics Research Lab, Stanford University.
Schuster, S., Chen, Y., & Degen, J. (2020). Harnessing the linguistic signal to predict scalar inferences. In Proceedings of ACL (pp. 5387–5403). ACL.
Sennrich, R., Haddow, B., & Birch, A. (2015). Neural machine translation of rare words with subword units. arXiv:1508.07909.
Smolensky, P. (1990). Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artificial Intelligence, 46(1– 2), 159–216.
Socher, R., Huval, B., Manning, C. D., & Ng, A. Y. (2012). Semantic compositionality through recursive matrix-vector spaces. In Proceedings of EMNLP–CONLL (pp. 1201–1211).
Socher, R., Perelygin, A., Wu, J., et al. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of EMNLP (pp. 1631–1642).
Sommers, F. (1982). The logic of natural language. Oxford University Press.
Song, X., Salcianu, A., Song, Y., et al. (2021). Fast WordPiece tokenization. In Proceedings of EMNLP. ACL.
Soricut, R., & Och, F. J. (2015). Unsupervised morphology induction using word embeddings. In Proceedings of NAACL (pp. 1627–1637).
Soulos, P., McCoy, R. T., Linzen, T., & Smolensky, P. (2020). Discovering the compositional structure of vector representations with role learning networks. In Proceedings of blackboxnlp (pp. 238–254). ACL.
Srivastava, A., Rastogi, A., Rao, A., et al. (2022). Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv:2206.04615.
Storks, S., Gao, Q., & Chai, J. Y. (2019). Recent advances in natural language inference: A survey of benchmarks, resources, and approaches. arXiv.
Suhr, A., Lewis, M., Yeh, J., & Artzi, Y. (2017). A corpus of natural language for visual reasoning. In Proceedings of ACL (pp. 217–223).
Tafjord, O., Dalvi, B., & Clark, P. (2021). ProofWriter: Generating implications, proofs, and abductive statements over natural language. In Findings of ACL–IJCNLP (pp. 3621–3634). ACL.
Tan, H., & Bansal, M. (2019). Lxmert: Learning cross-modality encoder representations from transformers. arXiv:1908.07490.
Tan, H., & Bansal, M. (2020). Vokenization: Improving language understanding with contextualized, visual-grounded supervision. arXiv:2010.06775.
Thrush, T., Jiang, R., Bartolo, M., et al. (2022). Winoground: Probing vision and language models for visio-linguistic compositionality. In Proceedings of IEEE/CVF CVPR.
Tikhonov, A., Bylinina, L., & Paperno, D. (2023). Leverage points in modality shifts: Comparing language-only and multimodal word representations. In Proceedings of *SEM (pp. 11–17). ACL.
Tokmakov, P., Wang, Y.-X., & Hebert, M. (2019). Learning compositional representations for few-shot recognition. In Proceedings of IEEE/CVF ICCV (pp. 6372–6381).
Touvron, H., Martin, L., Stone, K., et al. (2023). Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288.
Truong, T., Baldwin, T., Cohn, T., & Verspoor, K. (2022). Improving negation detection with negation-focused pre-training. In Proceedings of NAACL (pp. 4188–4193). ACL.
Truong, T. H., Otmakhova, Y., Baldwin, T., et al. (2022). Not another negation benchmark: The NaN–NLI test suite for sub-clausal negation. In Proceedings of AACL–IJCNLP (pp. 883–894). ACL.
Tsuchiya, M. (2018). Performance impact caused by hidden bias of training data for recognizing textual entailment. In Proceedings of LREC. ELRA.
Turing, A. M. (2009). Computing machinery and intelligence. In Parsing the Turing test (pp. 23–65). Springer.
Van Benthem, J. (1986). Natural logic. In Essays in logical semantics (pp. 109–119). Springer Netherlands.
Van Benthem, J. (2008). A brief history of natural logic. In Logic, navya-nyaya and applications, homage to Bimal Krishna Matilal. College Publications.
Vashishtha, S., Poliak, A., Lal, Y. K., et al. (2020). Temporal reasoning in natural language inference. In Findings of EMNLP (pp. 4070–4078). ACL.
Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention is all you need. In Neurips.
Vedantam, R., Bengio, S., Murphy, K., et al. (2017). Context-aware captions from context-agnostic supervision. In Proceedings of IEEE/CVF CVPR (pp. 251–260).
Verga, P., Sun, H., Soares, L. B., & Cohen, W. W. (2020). Facts as experts: Adaptable and interpretable neural memory over symbolic knowledge. arXiv.
Vulić, I., Baker, S., Ponti, E. M., et al. (2020). Multi-simlex: A large-scale evaluation of multilingual and crosslingual lexical semantic similarity. Computational Linguistics, 46(4), 847–897.
Wang, A., Pruksachatkun, Y., Nangia, N., et al. (2019). Superglue: A stickier benchmark for general-purpose language understanding systems. In Neurips (Vol. 32). Curran Associates, Inc.
Wang, A., Singh, A., Michael, J., et al. (2019). GLUE: A multi-task benchmark and analysis platform for natural language understanding. In ICLR.
Warstadt, A., & Bowman, S. R. (2022). What artificial neural networks can tell us about human language acquisition. arXiv:2208.07998.
Weiss, G., Goldberg, Y., & Yahav, E. (2018). On the practical computational power of finite precision rnns for language recognition. In Proceedings of ACL (pp. 740–745).
White, A. S., Rastogi, P., Duh, K., & Van Durme, B. (2017). Inference is everything: Recasting semantic resources into a unified evaluation framework. In Proceedings of IJCNLP (pp. 996–1005). Asian Federation of Natural Language Processing.
Williams, A., Nangia, N., & Bowman, S. (2018). A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of NAACL (pp. 1112–1122). ACL.
Yanaka, H., Mineshima, K., Bekki, D., & Inui, K. (2020). Do neural models learn systematicity of monotonicity inference in natural language? In Proceedings of ACL (pp. 6105–6117). ACL.
Yanaka, H., Mineshima, K., Bekki, D., et al. (2019a). Can neural networks understand monotonicity reasoning? In Proceedings of blackboxnlp (pp. 31–40).
Yanaka, H., Mineshima, K., Bekki, D., et al. (2019b). HELP: A dataset for identifying shortcomings of neural models in monotonicity reasoning. In Proceedings of *SEM.
Yang, Z., Dai, Z., Yang, Y., et al. (2019). Xlnet: Generalized autoregressive pretraining for language understanding. In Neurips (Vol. 32). Curran Associates, Inc.
Yi, K., Gan, C., Li, Y., et al. (2019). Clevrer: Collision events for video representation and reasoning. arXiv:1910.01442.
Yuksekgonul, M., Bianchi, F., Kalluri, P., et al. (2022). When and why vision-language models behave like bags-of-words, and what to do about it? arXiv.
Yun, T., Bhalla, U., Pavlick, E., & Sun, C. (2022). Do vision-language pretrained models learn primitive concepts? arXiv:2203.17271.
Zaenen, A., Karttunen, L., & Crouch, R. (2005). Local textual inference: Can it be defined or circumscribed? In Proceedings of the ACL workshop on empirical modeling of semantic equivalence and entailment. ACL.
Zhang, C., Van Durme, B., Li, Z., & Stengel-Eskin, E. (2022). Visual commonsense in pretrained unimodal and multimodal models. arXiv:2205.01850.
Zhang, C., Yang, Z., He, X., & Deng, L. (2020). Multimodal intelligence: Representation learning, information fusion, and applications. IEEE Journal of Selected Topics in Signal Processing, 14(3), 478–493.
Zhou, D., Schärli, N., Hou, L., et al. (2022). Least-to-most prompting enables complex reasoning in large language models. arXiv:2205.10625.
Zhou, Y., Liu, C., & Pan, Y. (2016). Modelling sentence pairs with tree-structured attentive encoder. In Proceedings of COLING (pp. 2912–2922). The COLING 2016 Organizing Committee.