Hostname: page-component-77f85d65b8-g4pgd Total loading time: 0 Render date: 2026-03-28T08:16:47.343Z Has data issue: false hasContentIssue false

Digital accents, homogeneity-by-design, and the evolving social science of written language

Published online by Cambridge University Press:  13 June 2025

AJ Alvero*
Affiliation:
Center for Data Science for Enterprise and Society, Cornell University, Ithaca, NY, USA
Quentin Sedlacek
Affiliation:
Department of Teaching & Learning, Annette Caldwell Simmons School of Education & Human Development, Southern Methodist University, Dallas, TX, USA
Maricela León
Affiliation:
Department of Teaching & Learning, Annette Caldwell Simmons School of Education & Human Development, Southern Methodist University, Dallas, TX, USA
Courtney Peña
Affiliation:
Stanford University School of Medicine, Stanford, CA, USA
*
Corresponding author: AJ Alvero; Email: ajalvero@cornell.edu
Rights & Permissions [Opens in a new window]

Abstract

Human language is increasingly written rather than just spoken, primarily due to the proliferation of digital technology in modern life. This trend has enabled the creation of generative artificial intelligence (AI) trained on corpora containing trillions of words extracted from text on the internet. However, current language theory inadequately addresses digital text communication’s unique characteristics and constraints. This paper systematically analyzes and synthesizes existing literature to map the theoretical landscape of digitized language. The evidence demonstrates that, parallel to spoken language, features of written communication are frequently correlated with the socially constructed demographic identities of writers, a phenomenon we refer to as “digital accents.” This conceptualization raises complex ontological questions about the nature of digital text and its relationship to social identity. The same line of questioning, in conjunction with recent research, shows how generative AI systematically fails to capture the breadth of expression observed in human writing, an outcome we call “homogeneity-by-design.” By approaching text-based language from this theoretical framework while acknowledging its inherent limitations, social scientists studying language can strengthen their critical analysis of AI systems and contribute meaningful insights to their development and improvement.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press.
Figure 0

Figure 1. In your classroom, how acceptable is it for students to use African American English or other “non-standardized” dialects or varieties to [do each of the following tasks] during the writing process?

Figure 1

Figure 2. In your classroom, how acceptable is it for students to mix languages to [do each of the following tasks] during the writing process (e.g., to use Spanish when English is the language of instruction or vice versa)?