Hostname: page-component-848d4c4894-75dct Total loading time: 0 Render date: 2024-05-29T12:17:22.228Z Has data issue: false hasContentIssue false

Leading voices: dialogue semantics, cognitive science and the polyphonic structure of multimodal interaction

Published online by Cambridge University Press:  05 December 2022

Andy Lücking*
Laboratoire de Linguistique Formelle (LLF), Université Paris Cité, CNRS – UMR 7110, Paris, France Text Technology Lab, Goethe University Frankfurt, Frankfurt am Main, Germany
Jonathan Ginzburg
Laboratoire de Linguistique Formelle (LLF), Université Paris Cité, CNRS – UMR 7110, Paris, France
*Corresponding author. Email:


The neurocognition of multimodal interaction – the embedded, embodied, predictive processing of vocal and non-vocal communicative behaviour – has developed into an important subfield of cognitive science. It leaves a glaring lacuna, however, namely the dearth of a precise investigation of the meanings of the verbal and non-verbal communication signals that constitute multimodal interaction. Cognitively construable dialogue semantics provides a detailed and context-aware notion of meaning, and thereby contributes content-based identity conditions needed for distinguishing syntactically or form-based defined multimodal constituents. We exemplify this by means of two novel empirical examples: dissociated uses of negative polarity utterances and head shaking, and attentional clarification requests addressing speaker/hearer roles. On this view, interlocutors are described as co-active agents, thereby motivating a replacement of sequential turn organisation as a basic organising principle with notions of leading and accompanying voices. The Multimodal Serialisation Hypothesis is formulated: multimodal natural language processing is driven in part by a notion of vertical relevance – relevance of utterances occurring simultaneously – which we suggest supervenes on sequential (‘horizontal’) relevance – relevance of utterances succeeding each other temporally.

© The Author(s), 2022. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


Argyle, M. (1988). Bodily communication (2nd ed.). Routledge.Google Scholar
Austin, J. L. (1950). Truth. In Proceedings of the Aristotelian society. Supplementary, Reprinted in John L. Austin: Philosophical papers (2nd ed., Vol. XXIV, pp. 111128). Clarendon Press.Google Scholar
Baddeley, A. (2012). Working memory: Theories, models, and controversies. Annual Review of Psychology, 63, 129. ScholarPubMed
Barwise, J., & Etchemendy, J. (1987). The Liar. Oxford University Press.Google Scholar
Barwise, J., & Perry, J. (1983). Situations and attitudes. MIT Press.Google Scholar
Bavelas, J. B., & Gerwing, J. (2011). The listener as addressee in face-to-face dialogue. International Journal of Listening, 25(3), 178198. Scholar
Benitez-Quiroz, C. F., Wilbur, R. B., & Martinez, A. M. (2016). The not face: A grammaticalization of facial expressions of emotion. Cognition, 150, 7784. ScholarPubMed
Bennett, A. (1978). Interruptions and the interpretation of conversation. Annual Meeting of the Berkeley Linguistics Society, 4, 557575.CrossRefGoogle Scholar
Berninger, G., & Garvey, C. (1981). Relevant replies to questions: Answers versus evasions. Journal of Psycholinguistic Research, 10(4), 403420.CrossRefGoogle Scholar
Bickhard, M. H. (2008). Is embodiment necessary? In Paco, C. & Gomila, T. (Eds.), Handbook of cognitive science: An embodied approach, perspectives on cognitive science, chapter 2 (pp. 2940). Elsevier.Google Scholar
Bigelow, A. E. (2003). The development of joint attention in blind infants. Development and Psychopathology, 15(2), 259275. ScholarPubMed
Brogaard, B. (2019). What can neuroscience tell us about reference? In Abbott, B. & Gundel, J. (Eds.), The Oxford handbook of reference (pp. 365383). Oxford University Press. Scholar
Clancy, P. M., Thompson, S. A., Suzuki, R., & Tao, H. (1996). The conversational use of reactive tokens in English, Japanese, and Mandarin. Journal of Pragmatics, 26(3), 355387. Scholar
Clark, H. (1996). Using language. Cambridge University Press.CrossRefGoogle Scholar
Connell, L. (2019). What have labels ever done for us? The linguistic shortcut in conceptual processing. Language, Cognition and Neuroscience, 34(10), 13081318. Scholar
Cooper, R. (2015). Type theory, interaction and the perception of linguistic and musical events. In Orwin, M., Howes, C., & Kempson, R. (Eds.), Language, Music and Interaction (pp. 6790). College Publications.Google Scholar
Cooper, R. (2019). Representing types as neural events. Journal of Logic, Language and Information, 28(2), 131155.CrossRefGoogle Scholar
Cooper, R. (2013). From perception to communication: An analysis of meaning and action using a theory of types with records (TTR). Oxford University Press (in press).Google Scholar
Cooper, R., Dobnik, S., Larsson, S., & Lappin, S. (2015). Probabilistic type theory and natural language semantics. Linguistic Issues in Language Technology, 10(4), 143. Scholar
Cooper, R., & Ginzburg, J. (2015). Type theory with records for natural language semantics. In Lappin, S. & Fox, C (Eds.), The handbook of contemporary semantic theory (chapter 12, 2nd ed., pp. 375407). Wiley-Blackwell.CrossRefGoogle Scholar
Cooper, R. P., & Peebles, D. (2015). Beyond single-level accounts: The role of cognitive architectures in cognitive scientific explanation. Topics in Cognitive Science, 7(2), 243258. ScholarPubMed
Daltrozzo, J., & Schön, D. (2009). Conceptual processing in music as revealed by N400 effects on words and musical targets. Journal of Cognitive Neuroscience, 21(10), 18821892. ScholarPubMed
de Ruiter, J. P. (2004). On the primacy of language in multimodal communication. In Proceedings of the workshop on multimodal corpora (pp. 3841). European Language Resources Association (CD-ROM).Google Scholar
Debras, C. (2017). The shrug: Forms and meanings of a compound enactment. Gesture, 16(1), 134. Scholar
Demberg, V., Keller, F., & Koller, A. (2013). Incremental, predictive parsing with psycholinguistically motivated tree-adjoining grammar. Computational Linguistics, 39(4), 10251066.CrossRefGoogle Scholar
Dowty, D. R. (1979). Word meaning and Montague grammar. Reidel.CrossRefGoogle Scholar
Duranti, A. (1997). Polyphonic discourse: Overlapping in Samoan ceremonial greetings. Text – Interdisciplinary Journal for the Study of Discourse, 17(3), 349382.CrossRefGoogle Scholar
Ebert, C. (2014). The non-at-issue contributions of gestures. In Workshop on demonstration and demonstratives. University of Stuttgart.Google Scholar
Enfield, N. J. (2009). The anatomy of meaning: Speech, gesture, and composite utterances. Language, Culture and Cognition, Vol. 13. Cambridge University Press.CrossRefGoogle Scholar
Falk, J. (1980). The conversational duet. In Caron, B.R., Hoffman, M. A. B., Silva, M., Van Oosten, J., Alford, D. K., Hunold, K. A., Macauly, M. & Manley-Buser, J. (Eds.), Annual meeting of the Berkeley Linguistics Society (Vol. 6, pp. 507514). Berkeley, CA: Berkeley Linguistics Society.Google Scholar
Fernando, T. (2007). Observing events and situations in time. Linguistics and Philosophy, 30(5), 527550. Scholar
Ferreira, F. (2005). Psycholinguistics, formal grammars, and cognitive science. The Linguistic Review, 22(2–4), 365380. Scholar
Frankland, S. M., & Greene, J. D. (2020). Concepts and compositionality: In search of the brain’s language of thought. Annual Review of Psychology, 71(1), 273303. ScholarPubMed
Fusaroli, R., Gangopadhyay, N., & Tylén, K. (2014). The dialogically extended mind: Language as skillful intersubjective engagement. Cognitive Systems Research, 29–30, 3139. Scholar
Garnham, A. (2010). Models of processing: discourse. WIREs Cognitive Science, 1(6), 845853. ScholarPubMed
Ginzburg, J. (1994). An update semantics for dialogue. In Bunt, H. (Ed.), Proceedings of the 1st international workshop on computational semantics. Tilburg University.Google Scholar
Ginzburg, J. (2012). The interactive stance: Meaning for conversation. Oxford University Press.CrossRefGoogle Scholar
Ginzburg, J., Cooper, R., Hough, J., & Schlangen, D. (2020a). Incrementality and HPSG: Why not? In Abeillé, A. & Bonami, O. (Eds.), Constraint-based syntax and semantics: Papers in honor of Danièle Godard. CSLI Publications.Google Scholar
Ginzburg, J., & Lücking, A. (2020). On laughter and forgetting and reconversing: A neurologically-inspired model of conversational context. In Proceedings of the 24th workshop on the semantics and pragmatics of dialogue, SemDial/WatchDial. Brandeis University.Google Scholar
Ginzburg, J., Mazzocconi, C., & Tian, Y. (2020b). Laughter as language. Glossa, 5(1), 104. Scholar
Ginzburg, J., Yusupujiang, Z., Li, C., Ren, K., Kucharska, A., & Łupkowski, P. (2022). Characterizing the response space of questions: Data and theory. Dialogue and Discourse (forthcoming).Google Scholar
Goodwin, C. (1979). The interactive construction of a sentence in natural conversation. In Psathas, G. (Ed.), Everyday language: Studies in ethnomethodology (pp. 97121). Irvington Publishers.Google Scholar
Goodwin, C., & Goodwin, M. H. (1992). Assessments and the construction of context. In Auer, P. & Di Luzio, A. (Eds.), Rethinking context: Language as an interactive phenomenon (Vol. 11, pp. 147190). Amsterdam: John Benjamins.Google Scholar
Gregoromichelaki, E., Cann, R., & Kempson, R. (2013). On coordination in dialogue: Sub-sentential speech and its implications. In Goldstein, L. (Ed.), Brevity (chapter 3, pp. 5373). Oxford University Press.CrossRefGoogle Scholar
Gregoromichelaki, E., Kempson, R., Purver, M., Mills, G. J., Ronnie Cann, R., Meyer-Viol, W., & Patrick, G. T. H. (2011). Incrementality and intention-recognition in utterance processing. Dialogue and Discourse, 2(1), 199233. Scholar
Hadar, U., Steiner, T. J., & Rose, F. C. (1985). Head movement during listening turns in conversation. Journal of Nonverbal Behavior, 9(4), 214228.CrossRefGoogle Scholar
Hamm, F., Kamp, H., & Van Lambalgen, M. (2006). There is no opposition between formal and cognitive semantics. Theoretical Linguistics, 32(1), 140.CrossRefGoogle Scholar
Hanning, B. R. (1989). Conversation and musical style in the late eighteenth-century Parisian Salon. Eighteenth-Century Studies, 22(4), 512528.CrossRefGoogle Scholar
Hasson, U., Ghazanfar, A. A., Galantucci, B., Garrod, S., & Keysers, C. (2012). Brain-to-brain coupling: A mechanism for creating and sharing a social world. Trends in Cognitive Sciences, 16(2), 114121. ScholarPubMed
Heim, I. (1982). The semantics of definite and indefinite noun phrases. PhD thesis. University of Massachusetts Amherst.Google Scholar
Heylen, D. (2008). Listening heads. In Modeling communication with robots and virtual humans (pp. 241259). Springer.CrossRefGoogle Scholar
Hilton, K. (2018). What does an interruption sound like? PhD thesis. Stanford University.Google Scholar
Holler, J., & Levinson, S. C. (2019). Multimodal language processing in human communication. Trends in Cognitive Sciences, 23(8):639652. ScholarPubMed
Hummel, J. E. (2011). Getting symbols out of a neural architecture. Connection Science, 23(2), 109118. Scholar
Irish, M. (2020). On the interaction between episodic and semantic representations – constructing a unified account of imagination. In Abraham, A. (Ed.), The Cambridge handbook of the imagination. (pp. 447465). Cambridge Handbooks in Psychology. Cambridge University Press. Scholar
James, W. (1981). The principles of psychology. Harvard University Press.Google Scholar
Jones, M. R., & Boltz, M. (1989). Dynamic attending and responses to time. Psychological Review, 96(3), 459491. ScholarPubMed
Kamp, H. (1979). Events, instants and temporal reference. In Bäuerle, R., Egli, U., & von Stechow, A. (Eds.), Semantics from different points of view (pp. 376417). Springer Series in Language and Communication, Vol. 6. Springer.CrossRefGoogle Scholar
Kamp, H., & Reyle, U. (1993). From discourse to logic. Kluwer Academic Publishers.Google Scholar
Kempson, R., Meyer-Viol, W., & Gabbay, D. M. (2001). Dynamic syntax. Blackwell Publishers.Google Scholar
Kendon, A. (1967). Some functions of gaze-direction in social interaction. Acta Psychologica, 26(1), 2263. ScholarPubMed
Kendon, A. (2002). Some uses of the head shake. Gesture, 2(2), 147182.CrossRefGoogle Scholar
Kendon, A. (2004). Gesture: Visible action as utterance. Cambridge University Press.CrossRefGoogle Scholar
Kim, J. (1984). Concepts of supervenience. Philosophy and Phenomenological Research, 45(2), 153176.CrossRefGoogle Scholar
Krader, L. (2010). Noetics: The science of thinking and knowing. Peter Lang.Google Scholar
Krakauer, J. W., Ghazanfar, A. A., Gomez-Marin, A., MacIver, M., A., & Poeppel, D. (2017). Neuroscience needs behavior: Correcting a reductionist bias. Neuron, 93(3):480490. ScholarPubMed
Larsson, S. (2002). Issue based dialogue management. PhD thesis. Gothenburg University.Google Scholar
Lascarides, A., & Stone, M. (2009). Discourse coherence and gesture interpretation. Gesture, 9(2), 147180.CrossRefGoogle Scholar
Lerner, G. H. (1988). Collaborative turn sequences: Sentence construction and social action. PhD thesis. University of California.Google Scholar
Levinson, S. C., & Torreira, F. (2015). Timing in turn-taking and its implications for processing models of language. Frontiers in Psychology, 6, 731.CrossRefGoogle ScholarPubMed
Lewis, D. (1979). Scorekeeping in a language game. In Bäuerle, R, Egli, U, & von Stechow, A (Eds.), Semantics from different points of view (pp. 172187). Springer Series in Language and Communication, Vol. 6. Springer.CrossRefGoogle Scholar
Liotti, M., Ryder, K., & Woldorff, M. G. (1998). Auditory attention in the congenitally blind: Where, when and what gets reorganized? NeuroReport, 9(6), 10071012.CrossRefGoogle ScholarPubMed
Litwin, P., & Miłkowski, M. (2020). Unification by fiat: Arrested development of predictive processing. Cognitive Science, 44, e12867. ScholarPubMed
Loehr, D. (2007). Aspects of rhythm in gesture in speech. Gesture, 7(2), 179214.CrossRefGoogle Scholar
Lücking, A., & Ginzburg, J. (2020). Towards the score of communication. In Proceedings of the 24th workshop on the semantics and pragmatics of dialogue, SemDial/WatchDial. Brandeis University.Google Scholar
Lücking, A., & Ginzburg, J. (2021). Saying and shaking ‘no’. In Proceedings of the 28th international conference on head-driven phrase structure grammar, HPSG 2021. University Library.Google Scholar
Lücking, A., Mehler, A., & Menke, P. (2008) Taking fingerprints of speech-and-gesture ensembles: Approaching empirical evidence of intrapersonal alignmnent in multimodal communication. In Proceedings of the 12th workshop on the semantics and pragmatics of dialogue, LonDial’08 (pp. 157164). King’s College London.Google Scholar
Marr, D. (1982). Vision. Freeman.Google Scholar
Mazzocconi, C., Tian, Y., & Ginzburg, J. (2020/22) What is your laughter doing there: A taxonomy of the pragmatic functions of laughter. IEEE Transactions of Affective Computing, 13(3), 13011321 (Published online 2020).Google Scholar
McNeill, D. (1992). Hand and mind – What gestures reveal about thought. Chicago University Press.Google Scholar
Mehler, A., & Lücking, A. (2012). Pathways of alignment between gesture and speech: Assessing information transmission in multimodal ensembles. In Giorgolo, G. & Alahverdzhieva, K. (Eds.), Proceedings of the international workshop on formal and computational approaches to multimodal communication under the auspices of ESSLLI.Google Scholar
Meteyard, L., Cuadrado, S. R., Bahrami, B., & Vigliocco, G. (2012). Coming of age: A review of embodiment and the neuroscience of semantics. Cortex, 48(7), 788804. ScholarPubMed
Mondada, L. (2014). The local constitution of multimodal resources for social interaction. Journal of Pragmatics, 65, 137156. Scholar
Mondada, L. (2016). Challenges of multimodality: Language and the body in social interaction. Journal of Sociolinguistics, 20(3), 336366. Scholar
Montague, R. (1974). Pragmatics. In Thomason, R. (Ed.), Formal philosophy. Yale University Press.Google Scholar
Mundy, P., & Newell, L. (2007). Attention, joint attention, and social cognition. Current Directions in Psychological Science, 16(5), 269274. ScholarPubMed
Nummenmaa, L., & Calder, A. J. (2009). Neural mechanisms of social attention. Trends in Cognitive Sciences, 13(3), 135143. ScholarPubMed
Perner, J., Huemer, M, & Leahy, B. (2015) Mental files and belief: A cognitive theory of how children represent belief and its intensionality. Cognition, 145(Suppl C), 7788. Scholar
Pickering, M. J., & Garrod, S. (2004). Toward a mechanistic psychology of dialogue. Behavioral and Brain Sciences, 27(2), 169190.CrossRefGoogle Scholar
Pickering, M. J., & Garrod, S. (2013). An integrated theory of language production and comprehension. Behavioral and Brain Sciences, 36(4), 329347. ScholarPubMed
Poesio, M., & Rieser, H. (2010). Completions, continuations, and coordination in dialogue. Dialogue and Discourse, 1(1), 189.CrossRefGoogle Scholar
Poggi, I. (2001) Mind markers. In The semantics and pragmatics of everyday gestures. Verlag Arno Spitz.Google Scholar
Pollard, C., & Sag, I. A. (1994). Head-driven phrase structure grammar. CSLI Publications.Google Scholar
Purver, M. (2006). CLARIE: Handling clarification requests in a dialogue system. Research on Language & Computation, 4(2), 259288.CrossRefGoogle Scholar
Recanati, F. (2012). Mental files. Oxford University Press.CrossRefGoogle Scholar
Roberts, C. (1996) Information structure in discourse: Towards an integrated formal theory of pragmatics. In OSU working papers in linguistics (Vol. 49, pp. 91136). Department of Linguistics, The Ohio State University.Google Scholar
Robledo, J. P., Hawkins, S., Cornejo, C., Cross, I., Party, D., & Hurtado, E. (2021). Musical improvisation enhances interpersonal coordination in subsequent conversation: Motor and speech evidence. PLoS One, 16(4), e0250166. ScholarPubMed
Sacks, H., Schegloff, E. A., & Jefferson, G. (1974). A simplest systematics for the organization of turn-taking for conversation. Language, 50(4), 696735.CrossRefGoogle Scholar
Schegloff, E. A. (2000). Overlapping talk and the organization of turn-taking for conversation. Language in Society, 29, 163.CrossRefGoogle Scholar
Schegloff, E. A. (2007). Sequence organization in interaction. Cambridge University Press.CrossRefGoogle Scholar
Sebanz, N., & Knoblich, G. (2009). Prediction in joint action: What, when, and where. Topics in Cognitive Science, 1(2), 353367. ScholarPubMed
Stalnaker, R. C. (1978). Assertion. In Cole, P. (Ed.), Syntax and semantics (Vol. 9, pp. 315332). Academic Press.Google Scholar
Stivers, T., & Enfield, N. J. (2010). A coding scheme for question–response sequences in conversation. Journal of Pragmatics, 42(10), 26202626.CrossRefGoogle Scholar
Streeck, J. (2009) Gesturecraft. Gesture Studies, Vol. 2. John Benjamins.CrossRefGoogle Scholar
Streeck, J., & Hartge, U. (1992). Previews: Gestures at the transition place. In Auer, P. & Di Luzio, A. (Eds.), The contextualization of language (pp. 135157). John Benjamins.CrossRefGoogle Scholar
Tannen, D. (1984). Conversational style: Analyzing talk among friends. Oxford University Press.Google Scholar
Thompson, H. S. (1993). Conversation as musical interaction. HCRC Edinburgh unpublished lecture.Google Scholar
Tian, Y., & Ginzburg, J. (2016) No I am: What are you saying “No” to? In Sinn und Bedeutung 21. The University of Edinburgh.Google Scholar
Tian, Y., Maruyama, T., & Ginzburg, J. (2017). Self addressed questions and filled pauses: A cross-linguistic investigation. Journal of Psycholinguistic Research, 46(4), 905922.CrossRefGoogle ScholarPubMed
Tomasello, M. (1999). The cultural origins of human cognition. Harvard University Press.Google Scholar
Tuite, K. (1993). The production of gesture. Semiotica, 93(1/2), 83105.CrossRefGoogle Scholar
Vertegaal, R., Slagter, R., van der Veer, G., & Nijholt, A. (2001). Eye gaze patterns in conversations: There is more to conversational agents than meets the eyes. In Proceedings of SIGCHI 2001, CHI ‘01 (pp. 301308). Association for Computing Machinery. Scholar
Vilhjálmsson, H., Cantelmo, N., Cassell, J., Chafai, N. E., Kipp, M., Kopp, S., Mancini, M., Marsella, S., Marshall, A. N., Pelachaud, C., Ruttkay, Z., Thórisson, K. R., van Welbergen, H., & van der Werf, R. J. (2007). The behavior markup language: Recent developments and challenges. In Pelachaud, C., Martin, J.-C., André, E., Chollet, G., Karpouzis, K., & Pelé, D. (Eds.), Intelligent virtual agents (pp. 99111). Springer.CrossRefGoogle Scholar
Werning, M. (2020). Predicting the past from minimal traces: Episodic memory and its distinction from imagination and preservation. Review of Philosophy and Psychology, 11, 301333. Scholar
Yuan, J., Liberman, M., & Cieri, C. (2006). Towards an integrated understanding of speaking rate in conversation. In Proceedings of INTERSPEECH (pp. 541544). Pittsbergh, Pennsylvania: International Speech Communication Association.Google Scholar
Yuan, J., Liberman, M., & Cieri, C. (2007). Towards an integrated understanding of speech overlaps in conversation. In ICPhS XVI. The International Congress of Phonetic Sciences.Google Scholar