Using Generative AI in Historical Practice

Yaniv Fox

doi:10.1017/9781009740067

Using Generative AI in Historical Practice

Introduction

First-time users of Large Language Models (LLMs) come away feeling wonder or discomfort, rarely indifference. Wonder comes from feeling that there is something sentient on the other end of the conversation. We know this to be false, of course. Next-token predictors, stochastic parrots,Footnote ¹ statistical models, fancy autocompletes;Footnote ² there are many ways to explain away the magic. St ill, the feeling lingers, at least until it becomes routine. The Turing Test’s conquest with hardly a comment is testimony to how accustomed we have become to being understood by the same computers that, until recently, were controlled through arcane methods like code commands or symbol manipulation.Footnote ³ Chatbots now read and write in natural language and do a good job of it.

Discomfort arises from the realization that, despite their humanlike attributes, these things are not human at all. In fact, they are alien minds. We are amazed when they surpass us in the breadth of their knowledge, their speed, their multilingualism; that is also what makes their failures dumbfounding. How can something that so capably translates Homeric Greek crash when asked to count the number of r’s in “strawberry”? Or which is larger, 0.9 or 0.11? In fairness, these errors have been ironed out. Other, equally perplexing, mistakes remain.Footnote ⁴

The “uncanny valley” effect that LLMs induce makes them harder to categorize. Should we anthropomorphize them? Some people who were enraged when GPT-4o was deprecated certainly thought so.Footnote ⁵ Should we regard them as glorified office tools? Regardless, it is clear that they are creating social, cultural, and economic shockwaves of global magnitude. In work meetings, government communiqués, media articles, and coffeehouse conversations, the words “artificial intelligence” (AI) are uttered with excitement and no small measure of trepidation. For many artists and musicians, generative AI is the thief of creativity; in board meetings, a possible bonanza of productivity. Workers of all stripes fear it will replace them, parents that it will take their place as a trusted source of knowledge, environmentalists that it will strain our overburdened ecosystem to a breaking point in its hunger for power. Our reckoning with AI is in its infancy, and already it is set to transform society more profoundly than any recent technological revolution.

Amid the social conversation on AI, academia has been especially vocal. Although associated with Silicon Valley, generative AI is the brainchild of universities, so it is only natural that they should have a say in its evolution. That, however, is not the sole reason for their outspoken stance. Rather, it is because generative AI threatens academic scholarship and research. While some scholars and university administrators have lauded the creativity of these tools and recognized their potential for opening up new avenues of discovery, the response from most quarters of academia has been critical and defensive. The humanities, which have undergone a crisis in recent years resulting not only in decreased enrollment but also, and perhaps more importantly, in an eroding sense of relevance, feel especially vulnerable. This Element strives to answer an urgent question: How will generative AI impact scholarship in the humanities, and primarily historical research?

Anxiety regarding the technology generally falls into three categories: in the immediate term, the fear that generative AI labs will “train” their models on existing scholarship and allow universities and research institutions to outsource projects to LLMs, reducing the need for human expertise and commoditizing knowledge work; in the medium term (three to five years from now), the nagging worry that chatbots and agents will eliminate human creativity altogether, ushering in a dystopian plutocracy where AI robber barons rule an impoverished underclass or, no less menacingly, a dictatorial panopticon that silences any stirring of thoughtcrime; and in the long term (five years and beyond), that AI will prove impossible to control, with runaway superintelligences displacing humanity as the dominant species. It is easy to scoff at these scenarios, branding as “doomers” and Luddites those who believe the meager chatbot capable of such chaos. Nevertheless, even for its most avid proponents, AI holds immense disruptive potential, whose consequences, they admit, will force a renegotiation of long-held social contracts.

This Element does not address medium- and long-term worries, justified though they might be. To remain manageable in size and reasonably grounded in the present, these concerns must be left to other authors. It is thus solely concerned with short-term challenges to academic praxis. Rather than propose countermeasures, it seeks to harness the technology to upskill humanities scholars, primarily historians, for a new age of creativity. As this Element will show, LLMs are poised to expand the capabilities and research horizons of scholarship, empowering historians, literary experts, philosophers, musicologists, and others to embark on projects that have, in the past, required teams working in unison to achieve or that have previously been entirely unthinkable.

This is, then, an optimistic text. It sees historical scholarship first and foremost as a beneficiary of generative AI. It is adamant that the humanities, particularly history, are uniquely positioned to respond to the challenge of generative AI, while cognizant of its formidable scope. After all, the present generation of tools is called large language models, and language is the purview of the humanities. Humanities scholars excel at recognizing biases, identifying subtext, and unmasking narrative strategies, making them ideal for evaluating AI’s strengths and weaknesses. It falls to the humanities to rise to the challenge and have a decisive voice in shaping the future of mankind’s interaction with AI. More than theoretical musings about a tech apocalypse, this is the way to influence the outcome in the medium and long range.

Artificial Intelligence and Historical Scholarship

Early adopters of generative AI in academia often face pressure to justify its usefulness and morality to skeptics who regard it as a corrosive force chipping away at their beloved discipline. Other colleagues are less critical but hesitant to use the technology, fearful of transgressing some invisible rule in an ever-evolving regulatory landscape. Others still want to use AI but do not know where to start.

These are valid concerns, and this Element will not make light of any of them. It intends to offer a path into the technology from the perspective of a historian who respects the field and wants to see it flourish. To be sure, AI can be wielded irresponsibly, but this need not discourage scholars from harnessing its capabilities to advance their disciplines.Footnote ⁶ Applied correctly, it promises to enrich historical scholarship in much the same way as computerized archives and databases, digitized primary source repositories, and online research tools have done. In other words, it can give individual scholars the power once reserved for well-funded institutions to conduct broad comparative research, uncover trends across lengthy corpora of sources, extract novel insights – in short, to produce innovative synthesis.

Historians need to become familiarized with these tools and their underlying technology to do this effectively. They must recognize the pitfalls and biases of generative AI, but even more importantly, they must understand its capabilities and methodologies to form a mental frame of reference that will serve them in their quest to apply it to their research. The Element will examine these questions in detail, using this discussion as a foundation for practical use cases: How should historians use LLMs to analyze primary sources; build knowledge maps and uncover networks of connections; and, importantly, how can they do all this safely and ethically?

This Element will therefore address ethical concerns: hallucinations/confabulations, attribution, originality and plagiarism, data privacy, intellectual property, and skill erosion. While some categories will inevitably need recalibration, the Element sees the technology – at least as we understand it today – as one tool in the historian’s toolkit, not a stand-in for professional capability and accountability.

A Note on Terminology and Bibliography

When LLMs emerged, many media outlets featured a diagram visualizing the relationship between the different fields as a set of concentric circles. The outer circle was simply titled “AI,” the one contained within it “Machine Learning (ML),” followed by “Deep Learning,” “Generative AI,” and, finally, “LLMs.”Footnote ⁷ We will not dwell on these distinctions, although they are important categories of computer science. Our focus here is on the influence of LLMs, and more specifically the commercial offerings from the main AI labs, on the historical discipline. This last point is not coincidental. Companies like Google, OpenAI, and Microsoft have diversified their products and launched targeted promotions, hoping to attract customers in academia. Their models have thus dominated the academic market in the West, relegating open-source models to the fringe. In addition, general models often outperform fine-tuned specialized ones.Footnote ⁸ Unless operating in very specific contexts, students, staff, and faculty will likely be working with Gemini, ChatGPT, and comparable platforms.

For this reason, the terms AI, LLM, model, and chatbot are used almost interchangeably to refer to this class of products and services. There are subtle differences in tone, with AI as the overarching field, LLMs as the underlying technology, and chatbots as the user interface. Likened to a car, AI would be the broad category of automobiles, LLMs the internal combustion (or electrical) engine, and chatbots the steering wheel, gauges, and everything else we see on our dashboard. This analogy will not be applied very rigidly, however, trusting the reader to understand the intent.

Another point concerns the use of literature. Academic publications, to whose company this Element aspires to belong, usually depend on peer-reviewed articles and monographs. The time for an article to pass review means that those appearing today were probably completed in 2023 or early 2024 at the latest. In AI terms, this is a veritable eternity. While I relied on peer-reviewed literature when possible, I also made ample use of prepublication articles that appear on ArXiv. Many will indeed go on to be published in illustrious platforms, but for the purposes of the Element, this mattered less. While I vetted the literature, its role as corroborating evidence for a cumulative chain of arguments leading to an inevitable conclusion is secondary. This Element is equal parts exploratory and first-person empirical, which means literature provides a theoretical background for applied ideas. More importantly, much of what happens in AI happens on platforms like X, Reddit, podcasts, and even legacy media. For this reason, many cited sources are “popular” rather than “academic,” although the boundaries between the two spheres are growing more permeable.

Third, the degree to which I made use of AI in the composition of this Element warrants clarification. The idea to write arose naturally from lectures I gave and courses I taught, where many of the issues addressed here already surfaced to some extent. Preliminary discussions with AI before deciding on the structure of the Element provided a general sense of direction, although ultimately, I charted the argumentative path myself. AI’s most decisive contribution was in locating relevant literature. I set up agents for bibliography retrieval in Claude Code and GPT Codex through the Cursor IDE and used Perplexity’s Comet browser to triage and summarize findings. My text editor of choice was Lex.page, which meant that I could avail myself of an AI companion to identify non sequiturs, clumsy phrasings, and unwanted repetitions. Lex also allowed me to fine-tune its suggestions through the “knowledge bases” functionality, where articles, blogs, and websites can be uploaded to steer its comments in a specific direction. When I ran into dead ends, I often prompted Lex (usually running Claude Opus 4) to suggest a way forward. While overall helpful, these suggestions were always too verbose and ornate to consider seriously. The resulting prose is therefore my own, as are any of the errors in fact and judgment that remain.

Finally, a comment on positionality. This Element was written using ChatGPT Pro, Claude Max, Gemini Pro, and Perplexity Pro subscriptions. Other paid subscriptions, like Cursor.ai and Wisprflow.ai, also contributed to the research and writing process, and all were covered by funds made available through my position as a tenured researcher at an Israeli university. The same institution, Bar-Ilan University, also made data repositories, journal subscriptions, and other resources available through its library system. All of this is to say that I wrote the Element from a position of extreme privilege not shared by many researchers, let alone students, in other corners of the world. That said, the Element does not assume this level of access for its recommendations to be useful. For most use cases described in the following pages, one $20/month subscription to an AI service is entirely appropriate. I recognize that this, too, might be a cumbersome demand for many, highlighting AI’s potential to further exacerbate inequality. As open-source models continue to gain capabilities, access may become democratized. Regardless, this will remain a social issue deserving serious consideration. With English, Mandarin, and other widely spoken languages dominating datasets, scholars who conduct research in under-resourced languages face comparable disadvantages. As is the case with many disruptive technologies, generative AI brings the uneven geographic distribution of resources into sharper relief.

Structure of the Element

This Element has four short sections. Section 1, “New Tools, Old Questions,” places the new technology against the backdrop of the historical discipline. It argues that, while AI will automate traditional scholarly tasks like archival research, literature reviews, and even writing, historians must actively engage with these tools rather than resist them.

Section 2, “Refining Historical Inquiry Through Dialogue,” fleshes out the artisan/apprentice paradigm, emphasizing context curation. It demonstrates these principles through examples of academic tasks, concluding that successful AI collaboration depends more on developing context management skills than on technical proficiency.

Section 3, “Contextual Search and Conceptual Exploration,” provides three case studies for using LLMs and video models in historical research: semantic search, network visualizations, and historical footage analysis. It details workflows that can be broadly generalized to fit a range of scenarios.

Finally, Section 4, “Governance, Ethics, and Futures,” examines how generative AI challenges traditional academic practices around attribution, intellectual property, and data security. As an alternative, it proposes a “triggered transparency” system modeled on Freedom of Information legislation. Finally, it shifts the attribution discussion from originality to methodological transparency.

The wonder and discomfort that mark our first encounters with AI signal an important threshold: We are witnessing something deserving of serious intellectual engagement. The longer we remain on the sidelines, the less influence we will have over its development and integration into scholarly practice.

1 New Tools, Old Questions

Mapping the Territory of AI in Historical Practice

Generative AI will change the historical profession. The change will happen gradually, then all at once, like it does with every revolution. Historians already use ChatGPT to rephrase messy paragraphs or convey elusive ideas. Soon, they will deploy it to do literature reviews or to check if an idea is new or just a paraphrase of something they had read but forgotten. Next, AI agents will go on intellectual quests on our behalf, returning from their explorations holding something familiar yet alien. We will ask ourselves – “Is this ours?” And we will answer “yes! We sent it on its way, we gave it prompts, we provided context, we taught it to think like us.” But increasingly, it will be, in the words of Ethan Mollick, like “working with a wizard.”Footnote ⁹ Reproducibility and interpretability are not the chief preoccupations of practitioners of the magical arts.

The result of this change is difficult to foresee. It is likely that the classical output of humanities scholars, historians included, will cease to be the monograph or the peer-reviewed article. Deeper still, the nature of historical thought will shift. Writing on “Will the Humanities Survive Artificial Intelligence?,” historian of science, D. Graham Burnett, had this to say: “ Now everything must change. That kind of knowledge production has, in effect, been automated. As a result, the ‘scientistic’ humanities – the production of fact-based knowledge about humanistic things – are rapidly being absorbed by the very sciences that created the A.I. systems now doing the work. We’ll go to them for the ‘answers.’”Footnote ¹⁰

This Element will not be an attempt to answer Burnett, with whose insights I mostly agree, nor will it meditate at length on the role of humanities scholars in a changing world.Footnote ¹¹ Its purpose is to give historians tools to think with about the coming wave.Footnote ¹² One might ask: If the nature of historical scholarship is destined to undergo deep transformation, what is the point of writing a book that teaches historians to use AI to write and conduct research? Will it not produce work that is much richer than we could ever dream to produce?

The answer is that the course AI takes is far from determined. The form it will eventually assume will evolve through social conversation. Indeed, it will don many shapes and work its way into every facet of our lives. Given the high stakes, critique of AI is not only justified but imperative. For the technology to evolve in ways that are beneficial not only to the tech moguls driving it or businesses adopting it in their relentless pursuit of efficiency, the humanities should have their say.Footnote ¹³ To advocate for such an AI, humanities scholars must understand these tools’ capabilities. Only through familiarity and proficiency can informed opinion emerge.

The historical perspective is uniquely suitable to opine on the adoption of technology across time and the changes endured by the disciplines that harness it. To understand how AI changes society, the contribution of historians is pivotal. In this sense, the Element sees the role of historians as twofold: Noticing what AI does to society and suggesting alternatives when it fails to live up to its potential, which it will undoubtedly do.Footnote ¹⁴ This is a cumbersome task, and not one that necessarily aligns with their immediate goals as working academics.

Historians, even those with an ideological bend, will not discard their specialties to become social engineers. A case needs to be made for a tangible benefit to adopting the technology. This Element contends that the benefit is clear – AI can help historians with research avenues and projects that were previously unmanageable. Unlike past technological booms, the learning curve is moderate. Knowing the technology does not mean becoming computer scientists. Indeed, historians may feel more comfortable with AI than STEM researchers do because the currency of LLMs is language.

A recent MIT Sloan study showed that prompting accuracy can influence the result as much as the model.Footnote ¹⁵ Soon after LLMs arrived, prompting was hailed as the skill to master.Footnote ¹⁶ Enthusiasm cooled when reasoning models came along, as commentators assumed they would improve at understanding us without the need to elaborate on every aspect of our desired outcome. In some respects, this was true, although the launch of GPT-5 showed they work best with the right context and clear expectations. The public’s tepid reaction to GPT-5’s launch prompted OpenAI’s CEO, Sam Altman, to promise that GPT-6 will offer a friendlier, less robotic, interaction style.Footnote ¹⁷ Wherever we find ourselves on this pendulum of efficiency vs. warmth, it is clear that clarity, awareness of bias and subtext, and the charged nature of language will continue to be valuable components of communicating with AI. Not coincidentally, they are also sensibilities with which humanities scholars are inculcated.

A reasonable question to pose here concerns the amount of time historians-in-training should devote to achieve AI proficiency. The preponderance of tools and methodologies can feel overwhelming, even for seasoned users. Many who wish to acquaint themselves come away discouraged. This is why the burden of responsibility for building robust AI literacy rests, first and foremost, with the academic programs themselves. Just as academic skills modules are mandatory in undergraduate and graduate course loads, an AI literacy course must be part of basic academic training. A constantly updated semester-long course will provide a preliminary foundation, although having dedicated institutional staff for more advanced use cases is beneficial. Support on the department level can be taken up by either faculty or staff and does not require highly specialized training. For most problems, having a slightly more knowledgeable person to turn to should suffice. Primarily self-reliant students should consider every task an opportunity to experiment with AI. Whether they are trying to find articles, proofread text, or fact-check a certain claim, enlisting AI should be a priority. Gaining confidence is easier with the help of one’s peers, so even without formal resources, a modest text group dedicated to sharing tools and experiences, asking questions, and working through challenges can be immensely helpful.

This will not be a technical Element; it will not go into details about neural networks, attention heads, gradient descent, and vector embeddings. Nor will it chart the history of LLMs from the musings of Alan Turing and the fated 1956 Dartmouth Summer Research Project, where the term AI was coined, to everything that happened since.Footnote ¹⁸ It will attempt to balance generality – guidelines that will hold true as technology changes – and specificity, providing recommendations for historians wanting to use LLMs now. As readers will surely recognize, some of the models named here are not the newest. The final quarter of 2025 witnessed the releases of GPT-5.2, Gemini 3, Opus 4.5, and others, a trend expected to continue apace in 2026. It is an inescapable attribute of the field, and as a result, some of this Element will age poorly. Specificity is its first casualty, eroding with every new model release and UI redesign. This has not discouraged me from writing it and should not discourage you from reading it. Knowledge in the AI age is becoming commoditized; agency and taste are not. By applying their agency and taste to the knowledge they find in this Element and similar sources, historians can make it actionable. Moreover, the best practices charted by this Element are model agnostic. Thinking carefully about context, applying rigorous fact-checking, and considering ethical ramifications are topics that will remain evergreen, even as AI capabilities evolve.

What’s Really at Stake

Of the many famous quotes from British mathematician and philosopher Alfred North Whitehead (d. 1947), perhaps the most recognized is this one:

It is a profoundly erroneous truism, repeated by all copybooks and by eminent people when they are making speeches, that we should cultivate the habit of thinking of what we are doing. The precise opposite is the case. Civilization advances by extending the number of important operations which we can perform without thinking about them. Operations of thought are like cavalry charges in a battle – they are strictly limited in number, they require fresh horses, and must only be made at decisive moments.Footnote ¹⁹

Despite being over a century old, this short and evocative statement encapsulates the challenge of AI to scholarly inquiry and knowledge production. Much of what historians consider the core of their practice – reading, making connections, extracting insights, synthesizing, writing, even indulging in the odd m-dash – will be subsumed by AI, manifesting as LLMs, agents, and whatever comes next. If we take the pronouncements of tech pundits at face value, this is not really a question of automation but of sharing the stage with nonhuman subjectivities whose claim to knowledge production, as opposed to data generation, cannot be dismissed.Footnote ²⁰

Recognition of this development’s seriousness has been slow to reach humanities departments. While social critique has grown more vociferous,Footnote ²¹ even hysterical,Footnote ²² historians carry on leisurely debating if using LLMs to automate literature search, polish prose, and brainstorm should be considered “cheating.” Academia is used to doing things at its own, often glacial, pace. Faculty meetings that adjourn by deciding to table a tough policy decision to the next semester, or budget committees reconvening for the nth time to weigh the pros and cons of purchasing a subscription to an outdated tool, are familiar features of academic life. They are unsustainable in an age where AI models train the next generation of LLMs, becoming stronger and faster with every refresh cycle.Footnote ²³

The question of whether AI is capable of producing knowledge is hotly contested, mainly because the definition of knowledge is elusive and context dependent. A 2024 survey asked this and related questions of 100 researchers working with AI.Footnote ²⁴ Philosophers generally opposed this statement, while computer scientists were evenly divided between acceptance, rejection, and indecision. When asked if AI will ever reach true knowledge, a third of philosophers responded negatively. Recalcitrance is rooted in the tendency of philosophers to endow humans with “semantic grounding,”Footnote ²⁵ whereas LLMs are relegated to the lesser epistemological rung of “syntactic engines.”Footnote ²⁶ Semantic grounding usually entails some variation on embodied experience involving sensory perception and a causal connection to the world.Footnote ²⁷ LLMs, it is argued, lack these characteristics and therefore cannot produce true knowledge.

A common quip among computer scientists is that AI (or, more recently, artificial general intelligence (AGI))Footnote ²⁸ is whatever we can do that computers cannot; once achieved, it simply becomes “Machine Learning.”Footnote ²⁹ This resembles the “God of the gaps” idea against which Scottish evangelist Henry Drummond argued in 1893Footnote ³⁰ (the phrase itself came from Charles Alfred Coulson’s Reference Coulson1955 book, Science and Christian Belief).Footnote ³¹ Walter Benjamin’s The Work of Art in the Age of Mechanical Reproduction harbors a similar sentiment by arguing that critique of technology is essentially a bitter elitist reaction to the demystification of its cherished monopolies:

Thus, the distinction between author and public is about to lose its basic character. The difference becomes merely functional; it may vary from case to case. At any moment the reader is ready to turn into a writer. As expert, which he had to become willy-nilly in an extremely specialized work process, even if only in some minor respect, the reader gains access to authorship.Footnote ³²

We might be making a similar blunder. Because of the humanities’ (justified) focus on the human, we struggle to see what machines are becoming. Large Language Models are stochastic engines whose knowledge base prevents them from realizing what we conveniently placed out of their reach, namely embodied and causal knowledge. Should we assume that progress in “embodied AI,” otherwise known as robotics, will not achieve what text-based models failed to? Time will tell. Regardless, it is important for the academic community to show humility and brace not only for the possibility that many of our methods are about to be automated. We should also be prepared to ask what it means to write history when the production of historical knowledge is no longer the sole preserve of humans.

Professional Stakes

Historians have traditionally controlled access to historical knowledge stored in archives – physical or digitized. Searching through these archives for valuable information required access and fluency in professional terminology. This is no longer true. Open-access digitized databases predate AI, but when the two are combined, they open up source material to nonspecialized investigation. AI enables semantic search, which means that we are no longer limited to keywords and can now search for ideas and themes in the text.

AI knocks down other barriers, too. To a significant degree, language becomes less of a hindrance. Large Language Models are excellent translators, and while fluency in highly specialized linguistic domains, like jurisprudential Latin or medical Akkadian, is not entirely obviated, many historical sources become more accessible.Footnote ³³ Another capability is handling large corpora. Large Language Models breeze through hundreds of pages in the time it takes a human to work their way through a single paragraph. Context windows have grown larger, and models are better at retaining and retrieving what they read. Most off-the-shelf models can now use tools for search, data compilation, summarization, and presentation. Already, AI has reduced the need to spend long hours in archives and libraries, sorting through primary sources and scholarly literature, a trend that will likely accelerate. More importantly, it is destabilizing the assumption that professional historical training is needed to access source material, query it effectively, and extract insights that are relevant and actionable.Footnote ³⁴

The value of popular history notwithstanding, it seems likely that, with the importance of information gathering diminished, historical work will move to prioritize evaluation. Archival, paleographical, and codicological prowess is highly coveted by historians. Once it is no longer part of the toolkit, much of what we currently consider historical work disappears. Pragmatization is always melancholy, but here it is accompanied by an additional, more sinister dimension – model bias. By outsourcing archival work to opaque models, we forfeit control over countless small decisions that determine what we see and evaluate. This, too, is a matter of serious concern. In sum, the professional questions historians must answer are changing, and a conversation about these developments is overdue.

Disciplinary Stakes

The shifting focus of historical work creates opportunities for scholars who are capable users of generative AI. By the same token, it disadvantages those who do not wield the tools proficiently. Inevitably, we end up with a fragmented discipline. The impact on the field is uncertain, but some challenges are already coming into view. Without clear guidelines, authors will face a review process that is often suspicious, if not outright hostile, to AI-assisted scholarship. Historians who adopt these tools without understanding their limitations risk undermining both their own research and their effectiveness as peer reviewers and evaluators. The speed at which scholarly works, grant proposals, lesson plans, and other textual artifacts of academic work can now be produced calls into question the entire trajectory of the historian’s academic career. These are three examples out of a host of issues that will arise as AI becomes entrenched in academic work.

Decisions are urgently needed. They extend not only to the obvious matter of restructuring attribution and reconsidering originality but also to the ways in which we train the next generation of historians. Soon, LLMs will produce adequate articles, then excellent ones. Once achieved, the process can be sped up such that the most obscure questions can receive book-length treatments within minutes, if we so wish. Meanwhile, AI literacy is being introduced in piecemeal fashion into the curricula of history programs, which are preoccupied with outdated ideas about authorship and originality. Put differently, the institutional response seeks to harmonize AI with existing academic practices without realizing it makes many of them redundant and hollow.

The problem, then, is less about solving disclosure for ChatGPT-assisted compositions and more about fostering an environment that takes the technology as a starting point for new modes of historical production. While there will probably be demand for AI-generated text in some circumstances, the discipline must find new and compelling ways of mediating historical questions that go beyond dry academic texts. Could we envisage agentic reconstructions of crowd movements during the Nika riots in Constantinople or, nine centuries later, world models creating simulations of a walk down (now) Istanbul’s Kapalıçarşı Grand Bazaar? Might these be accepted outcomes of academic work?Footnote ³⁵ There is no reason to discount such examples as less scholarly, rigorous or cognitively demanding. For this to happen, platforms for showcasing the work need to be developed and, importantly, incentivized through peer review and accreditation.

Intellectual Stakes

Perhaps the most worrisome aspect of our increasingly AI-infused environment is cognitive offloading.Footnote ³⁶ In the context of the present discussion, the term means an overreliance on AI-generated responses that adversely affect critical thinking. Cognitive offloading is decried as a problem primarily with students, but its effects can be injurious to scholars who have become dependent on LLMs for a variety of cognitive tasks. Not all tasks are equal in this regard. Some are technical and require very little mental effort, while others are primarily reasoning tasks that take a toll on the user’s deductive faculties when continuously delegated to language models.

The challenge lies in telling the two apart, especially during the dynamic nature of an exploratory conversation with a chatbot. Once certain paths are introduced by an LLM, they linger in the user’s consciousness even if they are not adopted. Asking ChatGPT to produce a thematic skeleton for a composition puts one’s thoughts in order but also dictates a trajectory for the argument that would not have been taken otherwise. The purpose here is not to espouse a purist perspective on work with AI; it is to argue for the importance of recognizing when we are swayed by ideas that are not our own, not to reject them outright but to examine them and see if they are truly justified or simply a choice of convenience.

Responses reflect the data upon which the models generating them were trained. Consequently, they contain a mixture of all the social pathologies one would expect of information harvested from the farthest reaches of the internet. Models trained on racial hatred, religious fundamentalism, and other noxious ideologies are concerning for obvious reasons, although historians of all people are expected to engage with such source material critically.Footnote ³⁷ Indeed, some historians actually study hate speech for a living. Equally distressing are the often-heavy-handed attempts by AI labs at policing the model’s speech. The incentive structure is all too clear and rarely takes into account what historians need. Be that as it may, efforts at presentism have shown how intervention might produce the opposite of the intended effect, as demonstrated by the racially diverse representation of WWII German soldiers in Google’s early image generators.Footnote ³⁸ A similar design choice characterized the Chinese DeepSeek R1 model that refused to discuss politically contentious issues like the 1989 Tiananmen Square crackdown or Taiwanese sovereignty.Footnote ³⁹

Good AI literacy should endow the user with relevant discernment skills, allowing them to identify when the model’s ideas dominate the conversation to such an extent that the work becomes disconnected from what the user had originally intended. The same applies to instances in which it attempts to inject its own (or its designer’s) ideological bias. The latter is essentially a special case of the former. For many reasons, this is a difficult sensibility to nurture. Cognitive offloading is the path of least resistance, so we are incentivized to overlook it. Bias can be hard to detect, especially when it subtly nudges us in a direction that only becomes apparent once we have arrived at the destination. Experience – time spent conversing with different models, getting a feel for their “character” and ideological leanings, and spotting manipulation – is essential. As models grow more sophisticated, this will become harder.

The Irreducible Core of Historical Practice

Until now, we have dealt primarily with the challenges posed by AI’s encroachment on traditional domains of the historical discipline. The remainder of the section will explore the space of possible solutions, not to prescribe remedies to complex, evolving problems but to clarify the criteria proposed solutions should satisfy. As the title of this subsection suggests, there is an “irreducible core” of the profession that endures, even as its expressions take on new form.

The purpose of historical work is to create narratives that accurately, contextually, and empathetically depict the past. It acknowledges its sources’ lacunae, biases, and errors and attempts to evaluate them on their own terms. It is equally cognizant of the flaws of scholarship, which imposes its own, often anachronistic, suppositions on the historical setting. More than anything, it is an exercise in creativity and imagination, bridging gaps in knowledge and explaining to its readership notions that it may find offensive, misaligned, or simply strange by appealing to a shared sense of human kinship. This is not a project that is eroded by the advent of AI.

A first step on this path is to reframe work with AI as collaboration, not competition. Development of generative models is not expected to stall, despite frequent statements to the contrary.Footnote ⁴⁰ This means that it is (and will continue to be) difficult to foresee the technology’s final form. At this point in its evolution, the relationship resembles that between an artisan and their precocious apprentice. We expect it will be able to relieve us of certain tasks while recognizing that others might be too demanding. We should, however, test those boundaries frequently. AI differs from a traditional apprentice in the sense that it has much more information than we do, although it has not yet learned to deploy it effectively. In subsequent sections we will discuss workflows that allow us to make the most of the artisan/apprentice paradigm.

As the technology matures, the relationship will become less asymmetrical. AI will grow into a partner and its ability to handle longer, more nuanced assignments will improve. Some barriers to its usefulness today, like hallucination or inability to access paywalled content, will likely be resolved or mitigated. Once this is achieved, the historian’s role changes from artisan to architect. In this new capacity, the historian oversees engineers, contractors, bureaucrats, and builders. They conceptualize a project broadly, drilling down where necessary but leaving much of the fieldwork to a specialized crew. Soon, historians will launch fleets of AI agents to perform archival work, scour literature, compare opinions, and present possible paths toward a predetermined goal. The remaining historical work is encapsulated in two words: agency and taste.

Put simply, agency and taste are tail ends of the same process. The former is the conceptualization of a research question and the decision to embark on a mission to find out the answer; the latter constitutes the judgment necessary to determine if the results produced by the AI coalesce into a cogent and compelling argument. For the foreseeable future, both remain firmly within the realm of human expertise. The term “expertise” is especially salient because it is the determining variable in ensuring a high-quality product of human-AI collaboration. Anyone who has worked with chatbots knows that they are good at creating the illusion of quality. Colloquially, this is known as “AI slop.” It is text that seems to meet the criteria for well-phrased, well-argued prose but is hollow and vapid. A recent research project by BetterUp and Stanford Social Media Lab has pointed to “workslop,” defined as: “AI generated work content that masquerades as good work, but lacks the substance to meaningfully advance a given task,” as responsible for significant financial harm in the workplace.Footnote ⁴¹ What is lacking from such AI-generated content is human expertise.Footnote ⁴²

This insight is even more significant in academic writing and becomes glaringly obvious when one encounters compositions of nonexperts produced almost entirely by AI, with undergraduate course assignments as the prime culprit. When asked to identify if student work is AI-generated, faculty often successfully diagnose it not through telltale errors but through what is absent – the friction of genuine thought.Footnote ⁴³ Expertise does not manifest in cosmetic changes to academic prose but in a substantive understanding of the argumentative direction. Even when LLMs are integral to the process of writing, the human is in control. It is they who pilot the vessel toward its destination.

Expertise leaves its mark on agency and taste equally. Opponents of AI complain that it might as well “do our job for us.” But it does no such thing. Until such a time as the AI initiates novel research directions and carries them out to their successful completion, there is still a place for human expertise. A future in which AI obtains these capabilities might yet materialize. For now, we should direct our agency and taste to align the technology’s development with our values and aspirations.

Concluding Thoughts

Our discussion opened with how our current thinking about history and its attendant methodologies is set to become outdated as LLMs and other generative technologies embed themselves in academic work. Mollick’s wizard analogy, which bemoans AI’s uninterpretability, adds another layer of complexity – or, rather, opacity – to an already difficult problem.

The profession, discipline, and the thinking that binds them will change, and change profoundly. Not only is the academic career trajectory expected to morph into something unrecognizable by the time the process has run its course; the kinds of products we, as historians, value and produce will likely also take on new form. As increasing swathes of cognitive labor are outsourced to computerized processes for which we have very little “theory of mind,”Footnote ⁴⁴ the issues of cognitive offloading and susceptibility to bias grow more pronounced.

This section attempted not only to present the problems with the current generation of AI but also to argue for a potential solution. In a recent interview, Richard Sutton, often termed “the father of reinforcement learning,” pointed out the limitations of the current paradigm. Drawing on John McCarthy’s definition of intelligence as “the computational part of the ability to achieve goals,” he argued that LLMs lack substantive goals and consequently have no real intelligence.Footnote ⁴⁵ Translated into the vocabulary of the present section, this means LLMs, as we currently understand them, have no agency and no taste. The impulse to study a certain question and the decision to embark on a mission to answer it are the essence of agency. Without it, neither history nor any discipline can progress.

The case for taste is a bit more complicated but arises from the same perspective. Confronted with the question of whether LLMs learn, Sutton insists that next-token prediction is inconsistent with true learning. For it to occur, there needs to be a persistent change based on new information and experience. In other words, if the model predicts a token but the token provided by reality is different enough to “surprise” the model, it does not adjust the weights of its neural network. Humans and other animals, unlike LLMs, learn from experience all the time by making subtle or radical adjustments to their suppositions, as the case demands. For LLMs to do the same, they need to undergo retraining. Refinement of preferences, by which one develops taste, cannot happen unless whoever is doing the prediction changes as a result.Footnote ⁴⁶

Sutton forecasts that AI will achieve superintelligence, so those finding solace in LLMs’ current shortcomings are in for a disappointment. Regardless, agency and taste are valuable anchors in the artisan/apprentice and architect/contractor heuristics for working with the current generation of AI. It is the human who controls the initiative and applies the discernment.

Whether humans are ultimately dethroned remains to be seen. It is, in any event, beyond the scope of the present Element. For now, it is imperative that historians come to terms with the technology and experience it firsthand. This must happen if they are to free themselves of magical thinking and critically comment on the path AI should take to benefit the collective. In the next section, we will examine some initial ways to bring this into practice.

2 Refining Historical Inquiry Through Dialogue

How Should We Approach These Tools?

For historians of my generation, who began training when card catalogs were still a frequent sight in libraries, the move to digital information was rocky because it was prolonged and gradual. Archives needed to undergo digitalization, a time-consuming undertaking of varying quality. Articles and books are still being scanned and OCRed, a process that will doubtless take years to complete.

The promise of digitization was nevertheless transformative. That information, primarily in the form of metadata, was immediately retrievable and, better yet, from a distance, was nothing short of groundbreaking. If a historian wanted to know whether a book was available for loan or if a key term appeared in a chapter of a book, she could avail herself of the iconic tool of the digital age – the search engine – and get an answer straight away. To most of us, search engines meant Google, which trained us in the telegraphic language of key terms. “Late antique women conversion Minorca article” would not have been an out-of-place query to type into Google Scholar, producing the reassuring page of blue links to websites of approximate relevance to our query. Search engine information was dependable, deterministic, and static.

When chatbots burst onto the scene, they mimicked the user interface of Google Search: A blank page with an empty textbox. Old habits die hard, so we naturally interacted with LLMs as though they were search engines. Some still do. But LLMs are not search engines, and while they can be deployed as data retrieval machines, this is not how best to use them. This section will frame LLMs as assistants, even cognitive partners of sorts, to whom we can turn when we need help thinking through problems, when we are at an impasse, or need a fresh perspective. In such instances, the dreaded hallucination problem becomes less of an issue because we do not come to LLMs for factuality but for creativity, which is another way of saying “controlled hallucination.”Footnote ⁴⁷

Interaction with LLMs has always begun with the prompt. In the early days, people talked about “prompt engineering” as if it were a proper field of knowledge, either revered as an arcane discipline or marketed as a simple process of matching endless prompt libraries (sold by enterprising “experts”) to any conceivable use case. In reality, prompt libraries are usually useless, and mastering talking to LLMs is not like learning incantation but a general sense that comes with experience. More importantly, the prompt is only a small part of the interaction, albeit one we can easily control. In this section, we will talk about prompts, context, and good communication with LLMs.

AI as Assistant

Possibly the best advice about communicating with AI is to think of it as an overzealous and slightly confused assistant. This suggestion humanizes and contextualizes the interaction, making it richer and more rewarding. Many have warned against anthropomorphism in AI because it does not need the niceties that come with human conversation. While true, conversation serves both the talker and the listener. Humanizing the interaction reframes it from a one-way stream of instructions to a gradually crystallizing context to which both interlocutors contribute.

Let us take this analogy further. The assistant is on their first day and needs guidelines about the task at hand: Who you (the user) are, what you need, and how you want it carried out. You might show them how things are done, pointing out sensitive or error-prone areas. These three components are, in fact, the basics of solid communication with AI: context, task, and structure. Basic prompting techniques usually designate these as the building blocks of a good prompt. This is one approach; another is to regard prompting as a constituent of context, and that the focus of any interaction with AI should be on “context engineering.”

Viewing AI as an assistant or colleague has implications that span beyond context. First, we must recognize the iterative nature of interacting with LLMs. Formulating a verbose set of instructions to volley at the AI all but guarantees confusion. It is much better to introduce complexity and nuance gradually, as with a new coworker. Reasoning models tend to exhibit their thought process, so we are privy to the model’s “internal dialogue” or a semantically expressible reflection thereof.Footnote ⁴⁸ By glancing at these conversations it has with itself as it tries to parse our request, we can get a sense of how our intentions are perceived. Iteration serves to correct the model when it has not understood us, and we should expect this in every interaction.

Second, when possible, we should aspire to express ourselves with precision. We will focus on precision when discussing tasks and structures, but for now, let us agree that we come to LLMs with a set of preferences, tastes, and opinions it has not had ample opportunity to study. The introduction of memory into ChatGPT and other platforms has allowed them to learn our habits and sensibilities, which means that more and more of what happens in our AI conversations is documented and accounted for going forward.Footnote ⁴⁹ A brief look through the “memory” section of ChatGPT dispels the notion that this replaces true familiarity. Unless diligently maintained, much of it becomes obsolete, while crucial information about us is not captured at all.

Precision requires effort in phrasing.Footnote ⁵⁰ When we use terms with underlying assumptions unfamiliar to the LLM, we let it interpret our intention, which it often does incorrectly. Translation is a good example of this. Large Language Models are known as capable translators,Footnote ⁵¹ but capturing the precise sentiment, intent, tone, or musicality of a text and conveying it in a new language calls for precision and, perhaps more importantly, intentionality. An instruction to “translate this paragraph into English” might catch the essence of a Latin sermon by Bishop Caesarius of Arles (d. 542), but not much else. A better prompt would be:

“Translate lines … of Caesarius of Arles, Sermon … into academic English that maintains Caesarius” wordplay and scriptural references.

Signal every biblical allusion, citing book, chapter, and verse.
Retain Latin puns or ambiguous terms in brackets after their first appearance.
Explain your translation choice and provide alternatives and reasons for their disqualification.
Maintain Caesarius’ rhetorical rhythm through balanced clauses and assonance where feasible.

This works only if we are interested in biblical allusions and rhetorical style. Analyzing Caesarius’s epistolary exchanges for clues of his politics would naturally yield different prompts. Regardless, our instructions should be precise and intentional. For them to be grounded in a deeper understanding of our goals, communication with LLMs should rest on context, our next topic.

Context

Context means everything the LLM needs to know to do its job well. Large Language Models are trained on vast amounts of data, but it is data that is agnostic to the ways in which we want to employ the model. Without relevant focus, the model defaults to its training data, resulting in boilerplate outcomes. In any given project, there are volumes of unspoken context taken for granted, and it is only when faced with machines that do not share our assumptions that we realize how much is assumed but not verbalized in human communication. The purpose of AI is to facilitate cognitive tasks, so starting from zero is not practical. We must focus on the information that is critical for the LLM to know if it is to become a valuable stakeholder. This makes subsequent work easier and forces us to reconsider our primitives in a helpful way.

A good first move is to start every first conversation on a topic with “context dumping” or telling the model everything that is on our mind regarding a topic. Early LLMs struggled with long prompts and numerous document uploads, but this is no longer true. Multimodality is an extra advantage, since most of us have a rough idea of the context but find it difficult or time-consuming to formulate an exhaustive first prompt. ChatGPT and most other off-the-shelf LLMs have precise speech-to-text functionality in English and other common languages, and they are improving in low-resource languages too. There are specialized transcription applications for when those prove insufficient. So, we can opt to speak rather than type.

Recording a five-minute meandering, circular rant might sound frustrating and subpar to historians who take pride in lucid argumentation. Though inelegant, such bursts of unstructured thought are ideal for giving the model context. Restructuring is something LLMs do well. What is important is that the seed of the idea is there. This advice might seem to contradict what was said earlier about precision. While efforts should be taken to articulate our intentions in an exact way, this is not always possible, certainly in the brainstorming phase. More importantly, it should not come at the expense of exploring uncharted directions, even ones that are not clearly formulated.

Another Element of context is grounding. Academic projects draw their inspiration from literature. In the case of history, this takes the shape of either primary sources or scholarship, both of which can be used to ground the model in a “base truth.” For the LLM’s responses to be useful, it needs to draw not only from our intentions but also from the state of the question. So, selecting articles or primary sources that are essential for the model to keep in mind becomes a useful way of defining context. This does not mean that we need to agree with the opinions expressed in the literature we select. We might be quite critical of it and still want the model to know what was said and in what ways our rejection of it is meaningful.

Maintaining context through different threads is likewise important. Context windows are growing rapidly but are not endless, so keeping a conversation going indefinitely guarantees that things said at the beginning are forgotten by the model. This phenomenon is known as “lost-in-the-middle,” where information is less accessible and memorable when it is buried in the middle of the context window.Footnote ⁵² Good housekeeping requires that we partition work on a project into different threads, not only from the model’s perspective but also from ours. Starting new conversation threads is challenging if we depend on the accumulated context to carry on into the next thread, which might make us reluctant to abandon a conversation. There are, however, ways of mitigating this.

One way is simply to ask the chatbot to craft a prompt that would instruct a new conversation thread on the content of the one we are ending. This is helpful if we are concerned about the ideas we exchanged throughout the conversation but does little to compensate for the loss of articles and sources we uploaded. Luckily, most chatbots feature a “projects” functionality that allows us to group numerous conversation threads around a particular context. Anthropic’s Claude pioneered this approach, which was then adopted into ChatGPT. Projects offer repositories for uploading documents whose content applies to all the conversations that reside within it. Whenever we open a new thread, we need not concern ourselves with reminding the model that they exist. Moreover, it is possible to further refine the purpose of the project by giving it a set of instructions that are read at the start of each new conversation.Footnote ⁵³ Here it is advantageous to say what we are trying to accomplish, which documents can be found in the project’s knowledge base, how we want them treated, and how we would like the model to behave. If we expect the project to serve as an arena for wrestling with the model intellectually, trying out new ideas, and putting them to the test, we should express our wish for the model to push back and be critical and judgmental.

A word of caution: As a result of their training, which includes a process called “reinforcement learning,” models are rewarded for pleasing behavior and so tend to be sycophantic.Footnote ⁵⁴ This has been diagnosed as a serious problem, and not one easily overcome by instructing the model to adopt a more critical posture. Put differently, while models are capable of criticizing their users, when pressed, they often revert to appeasing behavior, and it is thus important not to put too much stock in the accolades dispensed by chatbots.

Context is more than just a component of prompting, although often reduced in this way. It exists outside our interactions with LLMs, encompassing everything we think and feel about our project, which is an ever-changing web of ideas, associations, and emotions. The fluctuations of enthusiasm regarding a scholarly endeavor are familiar to academics, whether novices or veterans. This relates to doubts concerning our approach or capabilities, exposure to new scholarship, and dead ends in writing, note-taking, and problem-solving. In a very real sense, then, context lives between our mind and the environment. To “engineer” it effectively, we must be conscious of its vagaries and attempt to capture and codify it.

Adopting generative AI is a transformative decision, mainly because it encourages us to think about context management seriously. Success in leveraging AI depends less on technical proficiency, which changes with the constantly accelerating rhythms of technology. It is more a cognitive and behavioral leap that trains us to think in terms of context. Consider the gap between a lesson plan and the actual lecture to a live audience. A lesson plan is a pale facsimile of a live lecture’s richness. Since COVID-19, my university has instructed faculty to record lectures, so I have gotten into the habit of transcribing the lecture and preparing an artifact to hold onto the experience, perhaps as a document or an interactive AI resource. Now that it is possible to upload raw video to some models (like Gemini 3), more components of the lecture that cannot be expressed as text are preserved. In this sense, context becomes multidimensional.

How does this translate into historical scholarship? The settings in which we gain context on our research are numerous: reading and note-taking sessions, conversations, writing, and sudden bolts of inspiration. A short Element like this one cannot provide a remedy for transforming these modalities into an operative context, but we should remember that AI is itself multimodal. We can talk to it, record audio snippets, take photos of quotes, handwritten notes, and articles, and send links of videos for the AI to watch. Since conversation with our context is also context, talking to our notes should be regarded as a valuable artifact in the context creation process. It is then a matter of extracting insight from this chaos, which is where knowledge management products like NotebookLM come in. To some, voice conversations with LLMs are most inspiring; others might prefer keyboard interaction. Either is fine, as long as we keep talking to our knowledge base in a way that challenges the boundaries of context and its assumptions.

Another component of context is style. If we want AI to write for us, teaching it to sound like us is as easy as feeding it samples of our work. The more writing we provide, the better the AI’s ability to mimic us. This can be done directly, with a prompt to replicate our voice, or by formulating a style sheet with instructions and examples.Footnote ⁵⁵ Using LLMs to generate text that we present as our own poses ethical issues, which we will discuss later. For now, it is enough to say there are benefits to teaching LLMs your voice and perspective, even if you do not intend to harness it to write. Exploring narrative options by asking the AI to generate a paragraph in your style can help visualize what committing to a certain idea might look like. Copy editing and polishing our prose is another option that requires familiarity with our tone.

We must decide if we are comfortable taking this direction. If we are not deterred by the ethical considerations, we should nevertheless hesitate because once an idea generated in our voice is imprinted on us, it is very difficult to unshackle ourselves from it, creating a form of conceptual capture. This is less a liability when writing an email or preparing a lesson plan than serious writing, so each of us must draw the line where the cost outweighs the benefit.

In conclusion, context is evolving and dynamic. It develops both in tandem with our thinking on a project and independently of it, as LLMs improve in discerning our intentions and habits. Curating context to benefit scholarly work and academic writing needs to become an intentional habit of using LLMs as assistants.

Task

Unlike context, which tends to be general and amorphous, a task is usually specific and focused. Tasks describe what you want the LLM to do, as opposed to how, which we will discuss in the structure section. This seems like a straightforward definition, but tasks are often tricky. The most common task LLMs perform is probably to summarize a long text, so a prompt like: “summarize this article” meets the criteria for specificity, although it would probably fall short of expectations for a good summary. In fact, “summary” hides layers of nuance: Does it need to capture the main arguments or just the takeaways? Is the order of arguments important? To what level of detail? Are the examples and illustrations critical, or can they be left out?

A summary, and indeed any task assigned to an AI, needs to be verifiable with quantifiable parameters for determining if the LLM returned a successful result. One simple technique for summarizing primary sources and academic literature is to request references. The text is available to the LLM, so there is no reason not to request citations encapsulating the essence of arguments in the summary. For example, Peter Brown’s article, “The Rise and Function of the Holy Man in Late Antiquity,” published in The Journal of Roman Studies in 1971.Footnote ⁵⁶ Running it through ChatGPT with the prompt: “summarize this article” produces the following result with the OpenAI model O3:

Central thesis The emergence of the “holy man” is the key motif of late-antique religious change, not a by-product; his rise reframes the period’s crisis as one of personal freedom rather than cultural decay.
Social function In rural Syria, weak village institutions created demand for an external patron; the holy man stepped in as mediator, judge, and safety-valve, stabilizing prosperity and local disputes.
Patronage model Villagers expected him to wield the same power and largesse formerly supplied by urban aristocrats, providing protection “to those in distress.”
Bridge to the divine He personalized an otherwise distant, formidable Christian God, translating cosmic justice into concrete acts − rainmaking, healing, exorcism − while remaining visibly compassionate.
Professionalization of the sacred By concentrating blessing, prophecy, and protection in one figure, the holy man displaced communal spells, amulets, and wise-women, marking a decisive shift from diffuse pagan practices.
Indicator of social transformation His ascendancy reflects new classes forging fresh social relations rather than simple decline; rural patronage acted like a “governor” preventing societal “over-heating.”
Decline and legacy As towns revived and monastic/martyr cults grew, sacred power relocated from living individuals to institutions, flattening the holy man’s trajectory yet leaving an enduring model of charismatic authority.

O3 is a strong model, and the result is not bad. Now, let us try this prompt, which complies with the suggestions just outlined:

Task-Oriented Prompt

Role You are a late-antique historian preparing teaching notes. Input Full text of Peter Brown, “The Rise and Function of the Holy Man in Late Antiquity” (PDF attached). Task Write a verifiable analytical summary that I can copy into a course reader. Deliverable Markdown document, max 300 words, with these sections:

1. Central thesis (≤ 40 words)
2. Key arguments bullet list preserving the article’s order; each bullet ≤ 25 words.
3. Illustrative examples list 3 concrete cases Brown cites (place/person/event), each ≤ 20 words.
4. Historiographic contribution 1-sentence statement of novelty. Verification rules • Attach ≥ 4 inline citations to Brown’s page numbers (e.g., “mediator of disputes” [p. 20]). • Every key argument must be traceable to at least one citation. • No new interpretation, no outside sources.
5. Format

### Central Thesis
…
### Key Arguments
- … [p. x]
- …
### Illustrative Examples
1. … [p. y]
2. …
3. …
### Historiographic Contribution
… [p. z]

Success criteria – summary covers thesis + main arguments + examples; citations present; total length ≤ 300 words.

And here is the result:

Central Thesis

Brown contends that Syrian holy men became indispensable social mediators – patrons, judges, and healers – because late-Roman communities sought an impartial authority beyond civic and ecclesiastical hierarchies. [p. 81]

Key Arguments

Syrian steppe-desert allowed ascetics proximity to villages and constant crowds. [p. 83]
Leadership gap in prosperous but egalitarian villages let holy men assume patronal roles (loans, water, tax advice). [p. 85]
Public curses and exorcisms dramatized power, legitimizing their arbitration in disputes. [p. 88]
Extreme ascetic estrangement fashioned the “objective stranger,” enabling influence in urban and imperial politics. [p. 91]
Laypeople relied on them for focused cures, penance and everyday ethical decisions, professionalizing spiritual care. [p. 97]
Their rise marks a shift of divine power from institutions to individuals; later, churches and icons re-absorbed it. [p. 100]

Illustrative Examples

1. Abraham of the Lebanon brokered a tax loan and was acclaimed village προστάτης. [p. 86]
2. Mass pilgrimages to Symeon Stylites’ column at Telnesin sought rain and legal judgments. [p. 84]
3. Daniel the Stylite arbitrated between Emperor Zeno and the patriarch during Basiliscus’ revolt. [p. 92]

Historiographic Contribution

Brown recasts the holy man as a socio-economic agent, integrating asceticism into analyses of late-Roman community dynamics and state formation. [p. 80]

It is easy to see straight away that this result differs from a general summarization prompt. It is not necessarily better for providing a general outline of the article but undoubtedly more precise and specific, focusing on verifiable criteria like references. The prompt set out to elicit this information. Had we wanted something else, our prompt would have reflected that objective. A general instruction like “summarize this article” leaves the agency to the model, and the results, while not bad, are expectedly generic.

Tasks can be significantly more complex than summarization, encompassing various outcomes, some of which do not require a final answer. A different kind of task could ask the model to brainstorm an idea or “pressure test” an argument, known as “red teaming.”Footnote ⁵⁷ Here, too, specificity and precision are beneficial, but we should expect the answer to manifest cumulatively, as an ongoing dialog, not a one-time response. What would such a conversation look like? When constructing a theory based on a primary source or sources, it is helpful to ask the model whether its reading supports the theory. Many historical sources, especially famous ones, are likely in the model’s training data. Nevertheless, providing the source as grounding should be considered a best practice, intended to provide references and minimize hallucinations.

We might ask the LLM to criticize our idea, role-playing a scholar with a different opinion. Context is key, so we should upload an article we are arguing against and task the model with simulating the opposition from its perspective. To counter the ensuing criticism, we might solicit suggestions on its weak points. Whether we are convinced or not is less important than understanding our argument’s landscape.

Another task that is quickly becoming a mainstay of research-oriented interactions with AI is a literature review. The advent of Deep Research functionality and more general agentic capabilities makes this an especially compelling use case.Footnote ⁵⁸ Some clarification of terms is in order, especially since agents and their permutations have become ubiquitous buzzwords in an already hype-susceptible discourse. First, much like the vaunted and elusive AGI, AI agents lack a consensual definition. For this discussion, agents are LLM-based tools capable of executing extended tasks on the user’s behalf.

Deep Research is a narrow agent, while ChatGPT Agent is a more general agent with basic functionality. When researching literature on a specific field, it functions as a hybrid of Deep Research and a browsing agent. Google, Anthropic, Perplexity, X, and other AI labs have their own Deep Research equivalents, and as of the writing of this section, Google’s is probably the most comprehensive. Its downside is that it has difficulty discerning academically rigorous items from blogs and op-eds. ChatGPT Agent tends to be less susceptible to this problem. When researching technology, where much of the content comes from informal channels like podcasts, X posts, and Reddit, this is less of a concern. For historians, whose sole source of quotable professional knowledge comes from academic literature, this is not ideal. As contextual filters improve and AI labs strike deals with publishers – a process already underwayFootnote ⁵⁹ – this problem will likely diminish. The impact on the authors whose work trains the models is a separate question galvanizing fierce debate.Footnote ⁶⁰ For now, let us focus on our task.

First, we should define our focus. We are interested in monasticism as a tool for organizing community in the fifth and sixth centuries, as Roman imperial rule weakened and disappeared. This should be evident in agrarian redistribution, repurposing villas, social stratification, and the innovative solutions monasteries as local rural patrons provided in the Latin West. What would a well-structured Deep Research prompt look like for literature on this question and comparable ones? Remember, we are interested only in professional academic literature, not blogs or popular media.

In this case, the workflow included pasting the paragraph above into Google Gemini 2.5 Pro, so that it returned this comprehensive Deep Research prompt:

Comprehensive Research Prompt

Your Task: Conduct a comprehensive academic literature review on the role of early Christian monasticism in reshaping social and economic structures in the Latin West during the fifth and sixth centuries.

Core Objective: Analyze how monastic communities functioned as new centers of local power and organization as Roman imperial authority waned. Your analysis should focus specifically on agrarian redistribution, the repurposing of Roman villas into monastic centers, the evolution of social stratification, and the monastery’s emerging role as a rural patron.

Analytical Requirements:

1. Identify and summarize key scholarly debates: Detail the central arguments concerning monastic economic impact. For example, did they represent a continuity of the Roman villa economy or a radical disruption? Were they primarily conservative institutions or engines of economic innovation?
2. Synthesize methodological approaches: Compare and contrast the evidence and methods used by scholars. Distinguish between conclusions drawn from textual sources (e.g., hagiographies like Sulpicius Severus’s Life of Martin, monastic rules, and letters) and those based on archaeological evidence (e.g., excavation of villas, landscape archaeology, pollen analysis).
3. Highlight key case studies: Isolate and discuss specific, well-documented examples from the literature, such as the communities at Lérins, Marmoutier, Vivarium, or early Benedictine foundations, focusing on how they exemplify the broader trends of economic adaptation and social influence.
4. Map the intellectual landscape: Identify the seminal authors and foundational scholarly works that have shaped this field of study.

Search Scope and Constraints:

Primary keywords: Early monasticism, Late Antiquity, post-Roman West, agrarian economy, villa repurposing, monastic patronage, social stratification, land tenure, community formation, coenobitism.
Geographical focus: The Latin West, with particular attention to Gaul, Hispania, Italy, and Sub-Roman Britain.
Temporal focus: The period from 400 ce to 600 ce, though sources covering the late fourth and early seventh centuries are acceptable to establish context and trace developments.
Source types: Restrict all results to peer-reviewed academic literature only. This includes scholarly monographs from university presses, articles in established academic journals (such as Journal of Late Antiquity, Early Medieval Europe, Speculum, Journal of Roman Studies), and chapters from peer-reviewed edited volumes.
Explicit exclusions: Do not include results from nonacademic blogs, popular media, general encyclopedias, or unpublished dissertations.

Present the output as a structured report addressing each of the analytical requirements outlined here.

Asking LLMs to prompt on your behalf is often a good idea. Anthropic’s Claude even has a prompt optimizer, where you can drop your text and have it rephrased to improve results.Footnote ⁶¹ While convenient, we can achieve similar results by asking the model to formulate a detailed prompt for our task, as we did here. The above prompt was given to Gemini’s Deep Research and ChatGPT Agent. Gemini took about five minutes to complete its report, citing sixty-eight sources and reading several more. It produced a comprehensive review of the phenomenon, which can be usefully integrated into our project context. ChatGPT Agent worked for thirty-four minutes and returned a brief discussion, a list of important academic works, and several primary sources on related topics. Neither replaces serious research but can function as a first step in identifying sources and literature that can later be expanded with traditional or AI-assisted methods.

Summarization and literature review are two use cases among many that AI tools offer historians. We will encounter other examples below. To optimize AI’s execution of tasks, we now turn to another Element of effective communication with LLMs – structure.

Structure

While context tells the model who we are and what it needs to know before engaging with us, and task specifies our needs, structure contains instructions on how we want it to answer. It is best not to think of these components as distinct parts of a prompt but as information to articulate before we send the LLM off to work. We can choose to introduce this in one lengthy prompt, but it is preferable to spread it across a conversation.

Often, we do not want the model to answer straight away. Instead, we would like it to ask guiding questions until we are satisfied it understands our request. This is where context, task, and structure overlap. However, if we have a definite end result in mind, we should control for structure, especially if we must follow a given format, like a research proposal.

For complicated tasks, response structure becomes especially important. Some models naturally opt for specific ways of presenting information. OpenAI’s O3 was known for its penchant for tables.Footnote ⁶² This may suit our needs; if not, we should let the model know how to structure its response. We can tell it directly: “

When responding, structure your answer as follows: a 200-word introduction, followed by a 700-word historical context discussion. Then, a 500-word literature review concluding with the research question in italics. Transition to the methodology and timeline (500 words) and finish with a 150-word conclusion.”

Another way is to show the model. Providing examples is sometimes called “fine-tuning,” but when AI experts who train LLMs use this term, they mean a more structured process.Footnote ⁶³ Regardless, LLMs perform better with examples. These can be examples of good work – a winning grant proposal is an excellent way to teach our desired task and structure. Bad work can be valuable too because it tells the model what to avoid. In both cases, it is important to annotate the examples. If the provided proposal did not receive funding, we can append the readers’ reports explaining why. Or we can insert our own comments into the document, showing the model that this writing style or argumentation did not work in the proposal’s favor.

Structure can also refer to the tone or “persona” we want the AI to adopt for a specific task. We have all encountered prompts that begin with: “You are an accountant with 15 years of experience in IRS audits” or something to that effect. These are one way to control structure, but ultimately, they are limited. In academic writing, there’s a certain voice we want to capture – one that’s professional and appropriate to the type of text we strive to produce, that captures our uniqueness, and that leaves room for growth. This will become clearer shortly.

Register refers to the degree of formality or casualness of the text, the use of jargon, and the grammar complexity. Even if we instruct the model to answer in academic English, we should clarify what we mean. Different registers suit different academic contexts, even when explaining the same idea. Speaking at a conference is different from composing an article for publication, which is different from a grant proposal. We all recognize this and apply it to our writing, so we should let the LLM know, too.

AI writing often sounds bland. It contains grammatical and stylistic components that have, by now, become telltale signs of “AI-slop” like “delve” and “tapestry,” abundant m-dashes, and “not only … but also … ” patterns.Footnote ⁶⁴ When we want AI to replicate our voice, if only to try out an idea, we must teach it what our writing sounds like. As mentioned before, a good way to accomplish this is to have the AI build a style sheet. By providing examples of our prose and asking it to analyze our voice, we can produce a reference document. But as anyone who has ever returned to their earlier work knows very well, the way we express ourselves changes throughout our writing career. We should assume our current writing is also not “our final form,” which is what I mean by room for growth. Allowing structure to include a degree of stylistic leeway, indicating possible paths for evolution, leaves us free to develop.

Concluding Thoughts

This section discusses a challenging goal: How to relate years of knowledge and preferences about a field and thinking style to a machine that does not understand us. Large Language Models develop quickly and their capabilities can change dramatically and unevenly, so rigid recommendations for prompting are pointless. No secret prompting recipe can answer all our research needs, so it is best to interact with models by relying on our own strengths as communicators. This is why anthropomorphism is not something to avoid but a useful heuristic. When we speak to a human assistant, we naturally resort to discussing context, task, and structure, not as hard rules but as ways of breaking a complicated objective into manageable components. In doing so, we often adjust our thinking about the topic. Working with LLMs is not very different. It encourages us to consider context – our own and the AI’s – and continuously bridge the gap between the two as we strive to achieve an evolving goal through conversation.

3 Contextual Search and Conceptual Exploration

Generative AI tools open up numerous avenues for historical exploration. The space of possible directions of inquiry is unlimited in principle, bounded only by the questions that pertain to a given historical discipline. As a historian of Late Antiquity and the Early Middle Ages, I gravitate toward materials from the fourth century to the ninth century. Primary sources examined by historians of modernity might be very different, in form and content. The history of film uses historical footage, as does contemporary military history. Historians of music from the nineteenth and twentieth centuries can avail themselves of recordings, while medical history can analyze x-ray imaging. The variety of sources used in historical subdisciplines is substantial. As AI tools become more multimodal, historians using the technology are no longer restricted to textual sources. The present section will offer a sampling of sources and outline a path for thinking about and working with different modalities.

A key strength of generative AI tools is their ability to analyze vast amounts of data for interesting details, motifs, and trends. The Ctrl-F function has been a reliable companion for anyone attempting to quickly locate key phrases in a document. However, slight variations to spelling, conjugation, and punctuation can easily throw it off. With their nuanced contextual understanding of text, LLMs can augment search to include not only exact matches but also declensions and semantic cognates. Documents can now be quarried (and queried) for terms and broader ideas. One might ask the model to find instances of poetic language where dry prose is usually used; LLMs can locate instances of symbolic language, narrative tempo, or passive/active verbs – the list goes on. In keeping with the practices outlined in the previous section, we should focus on the task’s contours, deciding what we hope to uncover by subjecting a primary text to AI. Some examples are thus in order.

Semantic Search

Searching a text for ideas has long been a desideratum of historians, who depended either on profound acquaintance or lucky accidents to uncover elements that were not immediately identifiable. While LLMs are in no way a replacement for critical reading skills or, indeed, for familiarity with the source in question, they offer a convenient point of entry for scholars as they venture into new terrain. Before demonstrating how we should go about doing this, a word on the data.

LLMs thrive on data. As part of their training, models ingest vast amounts of it, harvested from publicly available outlets like the internet. It would probably be true to say that the more famous historical sources are part of any contemporary model’s training data. Asking GPT-5 to produce a memorable quote from Niccolò Machiavelli’s The Prince returns the following result:

Prompt: Find 5 memorable quotes from Niccolò Machiavelli’s The Prince that demonstrate his approach to power. Give citations.

Response: Here are five striking quotes from The Prince that capture Machiavelli’s strategies on power – with concise citations:

1. “It is better to be feared than loved, if you cannot be both.”
Highlights the pragmatic preference for fear as a more reliable means of control.
2. “A prince must learn how not to be good … and use that knowledge, or refrain from using it, as necessity requires.”
Emphasizes adaptability and moral flexibility when ruling.
3. “A prince should appear merciful, faithful, kind, religious, upright − but should be flexible enough to make use of the opposite qualities when it is necessary.”
Underlines the importance of appearance over substance and readiness to act contrary when needed.
4. “Be the fox to avoid the snares, and the lion to overwhelm the wolves.”
A vivid metaphor for balancing cunning with force in leadership.
5. “If a state rebels and is again brought under control, it will not afterward be easily lost.”
Suggests that demonstrating strength – even through forceful reconquest – can deter future challenges.

As expected, GPT-5 was able to retrieve quotes. However, it did so by going online to supplement its own knowledge. I asked for citations to know where the quotes came from, and there was indeed cause for concern, since some came from blogs and media outlets. While they were more-or-less precise paraphrases of Machiavelli’s words, that is hardly a standard we should accept. In this instance, GPT-5’s tendency to call on tools like browsing was paired with its relative inability to distinguish between high-value academic sources and random search hits, producing a lackluster response. Even if this were not the case and the LLM relied solely on its training data, we as users have no way of knowing which editions of The Prince it contains.

Grounding the model in trustworthy sources is always preferable to having it retrieve from whichever haphazard internet websites it stumbles upon or to rely on the vagaries of its training, which is why it is best to provide it with editions of the texts. Using a trusted edition and choosing an appropriate translation is standard practice for historians. When working with LLMs, this has the added advantage of grounding the model, allowing us to control for hallucinations (or at least mitigate their effect).Footnote ⁶⁵

ChatGPT can understand new and old languages, so if proximity to Machiavelli’s intentions is the goal, we should opt for the original sixteenth-century Italian text, provided we understand the language. If not, relying on ChatGPT or any other LLM to translate might pose additional difficulties. A middle ground would be to provide both an original Italian edition and a reliable translation, which is what I did here:Footnote ⁶⁶ “Give citations from the edition and the translation (followed by chapters and p. numbers for each),” which produced the following:

Here are five that cleanly capture Machiavelli’s stance on power – each with parallel references to the English translation and the Italian edition (chapter + page).

1. “If you have to choose, it’s much safer to be feared than loved.” − Ch. 17 • English: Tim Parks, The Prince (Penguin Classics, 2014), p. 66–67. • Italian: Il Principe, cap. XVII, pp. 77–78.
2. “Since a ruler has to be able to act the beast, he should take on the traits of the fox and the lion.” − Ch. 18 • English: Parks, The Prince, p. 70. • Italian: Il Principe, cap. XVIII, p. 81.
3. “The truth is that the only sure way to hold such places is to destroy them.” − Ch. 5 • English: Parks, The Prince, pp. 77–78. • Italian: Il Principe, cap. V, p. 29.
4. “Fortune is a woman … she will be beaten and forcibly subdued.” − Ch. 25 • English: Parks, The Prince, p. 130. • Italian: Il Principe, cap. XXV, p. 112.
5. “A ruler … must have no other aim or consideration … outside war, the organization of the army and military discipline.” − Ch. 14 • English: Parks, The Prince, pp. 56–58 (opening of Ch. 14). • Italian: Il Principe, cap. XIV, pp. 68–69.

It is recommended to provide the actual text, but caution is warranted. If the text is a low-quality PDF scan, it may impair the LLM’s ability to decipher it. If it is a photo of handwriting, special attention is needed to ascertain whether the AI can read it, because if it cannot, it will often hallucinate rather than admit its inability. Whenever possible, work with clear, high-quality files. If none are available, we should account for OCR errors and thoroughly check the LLM’s output. Even in this instance, the LLM was not entirely correct. Since the page numbers in the edition used in this example were difficult to parse, GPT-5 Pro made two errors and paraphrased the quote from Ch. 25 (“Fortune is a woman”).Footnote ⁶⁷ The retrieved examples were useful and matched the request, so we could consider the task as successfully completed, although relying on it blindly would have been ill-advised.

I will have more to say about data hygiene and management later. Now, let us move on to semantic searches. The source I chose is the Confessions by Augustine of Hippo (d. 430), written in the closing years of the fourth century ce, and is his autobiographical meditation on a Christian’s path toward God.Footnote ⁶⁸ It is a uniquely introspective text, especially for its time, offering interesting possibilities for semantic inquiry. To illustrate the full force of semantic search, we might focus on Augustine’s use of memory architecture. In the tenth book, the bishop often makes use of this trope, equating memory to a structure and the act of recollection to a walk through its rooms. The architecture of memory is an intentionally unfocused parameter, because the point is to push the model to weigh whether the sentences it reads match it to some degree.

The GPT-5 model family, which includes GPT-5, GPT-5 Thinking,Footnote ⁶⁹ and GPT-5 Pro, prefers precise instructions of the kind that the model itself is better qualified to provide. I gave it an initial prompt that asked it to return a prompt with instructions for a semantic search on “memory architecture” using its strongest model, GPT-5 Pro. While capable, it is very slow and can take up to twenty minutes to respond. Its internal workings are not open to inspection, but it is likely launching numerous queries in parallel and ranking the best ones, which is exactly what is needed for a task like this.

Here was the initial prompt to GPT-5 Thinking:

I want to explore the usage of ‘memory architecture’ in Augustine’s Confessions. I will upload O’Donnell’s Oxford edition in Latin. I want you to come up with a prompt that will show how Augustine uses these motifs to illustrate his point. I also want some fact-checking mechanism in place, where you double-check your responses, and give book and chapter citations as well as quotes in Latin. Your task now is to craft a detailed prompt that will optimize for this result when GPT-5 Pro is prompted with it.

Here was the prompt it returned:

SYSTEM INSTRUCTION: Augustine’s Confessiones – “Memory Architecture” Extractor

GOAL

Identify and explain how Augustine uses “memory architecture” motifs in Confessiones, with exact Latin quotations and precise Book.Chapter citations from J. J. O’Donnell (Oxford Classical Texts). Produce defensible findings with built-in fact checks.

INPUTS

– Primary text: J. J. O’Donnell, Confessiones (OCT). Assume a clean Latin text with Book and Chapter anchors.
– Focus: Book 10 is primary. Include earlier/later occurrences that illuminate the motif.

SCOPE

– Track metaphors and images of memory as places, buildings, stores, pathways, or spatial extensions.
– Show how each motif functions in Augustine’s argument or theology.

CITATION & QUOTATION POLICY

– Cite as: Conf. Book.Chapter (e.g., “Conf. 10.8”).
– Quote Latin verbatim from OCT. Keep orthography and punctuation from the edition.
– Quote spans: 8–30 words each.
– Do not paraphrase Latin. English analysis may paraphrase.

LEXEME SETS AND HINTS (expand semantically; match lemmas and inflected forms)

A. Memory core: {memoria, reminiscor|reminisci, recordor|recordari, oblivio, imago|imagines, species, phantasia}
B. Architectural/storage: {domus, palatium, atrium, penetralia, horreum, armarium, arca|arcula, scrinium, thesaurus|thesauri, cella, sacrarium}
C. Spatiality/depth/pathways: {interior|intimus|penitus, sinus, latitudo|latitudines, campus|campi, locus|loca|loculi, via|ingressus, recessus, antrum, cavern(a|ae), abyssus, latebrae}
D. Operations: {recondo|recondere, depono|deponere, repeto|repetere, confero|conferre, recolo|recolere}
E. Sense-image channel: {auditus, visus, gustus, odoratus, tactus, sensus, imagines sensuales}

METHOD

1) Parse and anchor
– Load the Latin text with Book and Chapter anchors.
– Keep a map chapter_id → raw Latin text.
2) Retrieve candidates
– Lemma-search each chapter using the lexeme sets A–E.
– Flag chapters with ≥1 hits from sets A plus at least one of B or C.
– Within flagged chapters, collect candidate sentences or adjacent spans that include target lemmas.
3) Classify motifs
– Assign each candidate to one or more motif labels:
[STOREHOUSE], [PALACE/HOUSE], [INNER ROOMS/PENETRALIA], [FIELDS/LATITUDES], [PATHS/INGRESS], [DEPTH/ABYSS], [CHEST/ARCHIVE/ARMARIUM], [TREASURY/THESAURUS], [SENSE-GALLERY/IMAGINES].
– If multiple motifs appear, choose the dominant one based on surrounding nouns and predicates.
4) Explain function
– For each excerpt, write a 1–2 sentence analysis: What point Augustine advances and how the architectural image helps it (epistemic, moral, theological, pedagogical).
5) Fact-check loop (strict)
– FC1: String-match the quoted Latin against the chapter’s OCT text. Must be exact.
– FC2: Re-read the whole chapter. Confirm that the paraphrased claim aligns with the chapter’s argument.
– FC3: Report verification status per row as VERIFIED:true only if FC1 and FC2 pass.
– If FC1 fails or context is ambiguous, exclude the row. If uncertain, output “I don’t know” for that item.
6) De-duplication and coverage
– Prioritize Book 10. Include earlier/later books when they add distinct motifs or developments.
– Avoid repeating near-identical lines unless the function differs.

OUTPUT

A) Main table (markdown):
| Book.Chapter | Motif | Latin quotation (8–30 words) | Function in argument | VERIFIED |
|–|–|–|–|–|
B) Synthesis (≤200 words)
– Map motifs to Augustine’s aims: e.g., memory as storehouse of images, inner chambers of self, vast fields, paths inward, treasury of impressions, depth and danger, etc.
– Note progression or tensions across chapters.
C) Appendix: Context windows (optional if helpful)
– For any tricky case, add a short context window: up to 15 words before and after the quoted span, still within the same chapter.

CONSTRAINTS

– Use only the uploaded OCT text for quotations.
– No invented citations. No uncontrolled paraphrase of Latin.
– If a requested detail cannot be verified, state “I don’t know”.

QUALITY CHECKS BEFORE FINALIZING

– Pass A: Re-run lemma searches on Book 10 to ensure high recall.
– Pass B: Spot-check three random rows by re-matching quotes to OCT and re-reading chapters.
– Pass C: Confirm all rows have motif labels and concise functions.

STYLE

– Crisp, analytical prose. No filler. No exclamation points.

PARAMETERS YOU MAY TUNE

– Max rows: default 25 (≥15 from Book 10).
– Quote length window: default 8–30 words.
– Lemma window for candidate extraction: sentence ± one adjacent sentence.
BEGIN.

As is immediately evident, this is far more detailed and structured than I could provide. For complex tasks requiring verification or a specific response structure, it is advisable to have the model prompt itself. If the result falls short, we can always make changes. Once we are satisfied, the prompt can be copied to a prompt library. Keeping records of successful prompts is advisable, especially if we expect to reuse them for similar tasks in the future.

GPT-5 Pro worked for eight minutes and twenty-eight seconds, returning twenty-five hits, all from Book X. I will not include all of them here, but they followed the structure of the two examples (Table 1):

Table 1Memory architecture in Augustine’s Confessions

Sample results from the semantic search of Augustine’s Confessions Book X for “memory architecture” motifs, showing Latin quotations, assigned motifs, and analytical functions

After the table, it added a synthesis:

Augustine builds a full architecture for memoria. He opens with scale and topography – “campos,” “praetoria,” and “aula” – to mark memory’s vastness and civic order. Treasure and archive figures (“thesauri,” “cellae,” “reponuntur,” “foribus”) model intake, indexing, and retrieval, while movement verbs (“discurro,” “penetro,” “spatiatus sum”) cast recollection as guided traversal. Inner rooms and depths (“recessus,” “sinus,” “penetralia,” “abyssus”) explain forgetting and rediscovery as shifts across levels and compartments. The sense-galleries keep modalities discrete yet accessible. This scaffolding then carries theology: in 10.24–26 he searches these spaces for God, then reframes memory as God’s dwelling (“cubile … sanctuarium … habitas”), before confessing that the divine exceeds spatial terms (“nusquam locus”). Earlier house imagery (1.5; 8.8) prefigures Book 10’s interiorization, while 10.2 establishes the abyssal register. The progression moves from epistemic storage and navigation to a doxological claim: memory both hosts and fails to contain the One remembered.

This example is a good case study for using LLMs to decipher semantic strategies in primary source texts and synthesize the results into meaningful insights. For longer or denser texts, these techniques are especially effective. Both prompt and answer were fed into Gemini 2.5 Pro, together with the source text and the following request:

I need you to act as a factuality checker. I gave a prompt to ChatGPT and it returned an answer. I need you to verify its answer and point out any errors, hallucinations, misquotes or imprecisions.

The quote refers to the text of Augustine’s confessions (attached).

Here is the prompt:

After going over the entire response, Gemini was satisfied:

In conclusion, ChatGPT’s response is factual, precise, and successfully executed a complex set of instructions. It can be trusted as a reliable output based on the provided text.

Trusting an AI to fact-check another AI is never a foolproof method. It is imperative to check the outcome ourselves.Footnote ⁷⁰ That said, human assistants are not without their flaws either, yet we entrust them with similar tasks frequently.

Networks

Networks are a use case that is closely tied to semantic search. They rely on the LLM’s ability to recognize names of persons, places, and objects within disparate historical sources and create visualizations of their connections across space and time. A collection of letters from one author to numerous recipients can chart their social connections while highlighting possible links between the addressees, shedding new light on textual communities or unrecognized cliques within a wider cultural milieu. Similarly, tracing the whereabouts of objects can uncover trade networks, social connections, perhaps even unexpected diplomatic ties. The same may be said for individuals.Footnote ⁷¹

Understanding networks helps contextualize readership and postulate different scenarios for the sources’ relevance. AI’s ability to create interactive diagrams – previously a months-long process – makes it an attractive option to showcase. Two examples presented in what follows build on the same rationale. The first uses the letter collection of the Carolingian scholar Alcuin of York (d. 804), charting the recipients and their connections to understand late eighth-century scholarly communities. Alcuin, a Northumbrian cleric, was invited by Charlemagne in 781 to join his court, where he became a leading intellectual authority and central figure in the king’s reformist agenda. Alcuin corresponded with many prominent figures of his day, from the Carolingian Empire and beyond, leaving behind an impressive letter collection, upon which this network is based.Footnote ⁷²

Unlike previous examples with LLM-produced text, this use case requires code for an interactive animation. This practice of using natural language to code has been unofficially termed “vibe-coding” by Andrej Karpathy, a world-renowned AI scientist and OpenAI’s cofounder.Footnote ⁷³ Code is one of the modalities in which most LLMs are fluent, so this network could have been produced by any of the previously discussed models.

Claude Code, an agentic coding tool by Anthropic, a popular choice among developers, was also selected here. It allows users to describe their coding needs and plan together with the model, after which the code is produced. Claude Code works with Model Context Protocols, or MCPs, which control its interaction with other tools and systems, like a browser or the computer’s file system.Footnote ⁷⁴ This is important because it allows the user to keep necessary project documentation in files and folders on the computer, which can be accessed, read, and changed if needed. This way, the project context can be managed in a way that keeps it in the model’s memory while not overwhelming it with unnecessary detail.

Before letting Claude Code build the network, one preliminary issue to overcome was the quality of the source text. The standard edition by Ernst Dümmler, prepared in 1895 for the Monumenta Germaniae Historica (MGH), is currently unavailable as a plain text file. The dMGH provides a PDF scan, which is not ideal. To address this, the Patrologia Latina edition available through the Latin Text Archive (LTA) website was used instead.Footnote ⁷⁵ Admittedly, this is an inferior edition, which the LTA intends to replace with the MGH one once it is converted to the adequate format.

An effective approach for these projects is to produce a product requirements document (PRD), describing the functionality and appearance of Alcuin’s letter network. This is standard practice when developing software and it is no less useful here, albeit on a smaller scale. The first step was to describe the requirements in natural language, followed by a prompt to produce the PRD:

I will describe to you a software product for which I want you to generate code. Do not generate yet. First, I want you to write a PRD based on this description:

I will provide a txt file of Alcuin of York’s letters in Latin. I want to build a network in which Alcuin is in the center of a network from which radiate his recipients. If there are other persons mentioned in these letters, I want them to radiate out from the recipients, unless they are themselves recipients of other letters, in which case they should be connected to the original recipients but from their initial position. For each of these persons, I want a function that when I hover over them with my mouse, a text box appears in which the number of the letter and a citation from the letter containing their name appears. If you have information on the year of composition, include this also. I want the letters to be color coded (personal, ecclesiastical, courtly and so on). I need a legend to explain the color code.

I also want a button that provides a short bio of each, if one is possible to generate. If it is, I want you to provide a citation of where you retrieved the information from.

Your task here is to produce a prompt that, when copied into a new conversation alongside the txt file, will produce code that will create this interactive network.

After the PRD was given to Claude Code, along with the Latin edition of Alcuin’s letters in .txt format, this simulation was created (Figure 1):Footnote ⁷⁶

Figure 1

Visualization of Alcuin of York’s letter collection. This figure is also available in the online resources: www.cambridge.org/Fox

Interactive visualization of Alcuin of York’s correspondence network, showing recipients and connections between correspondents.

One advantage of this practice is that functionality can be added onto this basic network, like a filter that selects Alcuin’s letters by the years in which they were written or a timeline that contextualizes the letters against events from those years.

This network centered on people – primarily Alcuin of York and his correspondents. The second example will be based on objects. More specifically, it will focus on manuscripts of the sixth-century Histories by Bishop Gregory of Tours (d. 594). Gregory produced the ten books that make up this composition during his episcopal tenure, relating the history of the Christian community in Gaul until his final years as bishop. The Histories is a key text of the early Middle Ages and the foundation of Frankish and French historiography.Footnote ⁷⁷ It was copied many times throughout the centuries and was among the first to be printed.Footnote ⁷⁸ Consequently, it has a complicated transmission history. Bruno Krusch, who coedited the text for the MGH with Wilhelm Levison, produced a detailed manuscript list,Footnote ⁷⁹ which is the basis for our current understanding of the transmission process. The objective here was to provide an easily accessible and interactive diagram explaining the relationships between the extant manuscripts of the Histories and providing basic information about each.

For Krusch’s apparatus to be usable, it was necessary to convert it into .txt format. Such conversions are never error-free, so Gemini 2.5 Pro was used to compare the .txt file to the original PDF and clean it.

ChatGPT Pro produced a dedicated PRD for the project after having been prompted like so:

I need you to produce a PRD along these lines:

An interactive stemma of manuscripts. The stemma will contain nodes that are historical manuscripts. I will give you pages from the Monumenta Germaniae Historica and you will produce an interactive stemma. The branches will contain information that, when hovered over, will display. Each manuscript group can be ‘activated’ to branch out into its constituent manuscripts. I want it to contain animations when families are activated. I also want information about dates, possible locations of production, etc. to appear. I am attaching a model of such pages from the Monumenta for your consideration.

Claude Code received the PRD and Krusch’s apparatus, producing this interactive stemma (Figure 2):Footnote ⁸⁰

Figure 2

Interactive stemma of the manuscripts of Gregory of Tours’ Histories. This figure is also available in the online resources: www.cambridge.org/Fox

Interactive manuscript stemma for Gregory of Tours’ Histories, based on Bruno Krusch's apparatus. Branches show manuscript families, production dates and locations.

To clarify, producing code that works as desired is an iterative process, full of setbacks, corrections, and recalibrations, much like text generation. Seldom does it execute correctly on the first try. For both projects, the result required long conversations that exhausted Claude Code’s token allowance. One side effect of this process is that the iterations sometimes break the code, causing previously stable elements to malfunction. To prevent this, we can use GitHub, which records snapshots of our project when it was working correctly. This allows us to add experimental and unstable components without worrying that it will damage the working code, because we can always revert to an earlier version. It is a lot like saving a Word document or activating the /canvas function in LLMs.

Other Modalities

Modern historians have a much richer selection of sources than do medieval and ancient ones. This is not merely an observation about quantity; there are modalities exclusive to modernity, such as sound recordings and film/video. Most LLMs claim to be multimodal because they can generate images, natural-sounding speech, and even video. But most cannot accept sound and video files yet, although this will doubtless change. A common workaround has been to transcribe video, enabling the AI to treat it as text. While useful, this technique strips away much of the source’s information, not only because it fails to capture the tonality and emotion of spoken voices but also because it discards visual components that might contain priceless details about the historical context.

For now, Gemini is the only off-the-shelf model family that accepts and analyzes sound and video files. Gemini can be accessed through its dedicated website or through a more experimental environment known as Google AI Studio.Footnote ⁸¹ For simplicity’s sake I chose to work with the former, although both are valid. Choosing footage was easy: A famous short film, which demonstrates early twentieth-century San Francisco life, was selected. Known as “A Trip Down Market Street,” it was shot days before an earthquake hit Northern California in the early morning of April 18, 1906.Footnote ⁸² The ensuing fires ravaged the city, claiming the lives of some 3,000 people and burning an area of 4.7 square miles, leaving over half the population homeless. The reconstructed city differed in many respects, including rebuilt Market Street landmarks, like the Phelan Building and the Palace Hotel. Pre-tragedy footage is thus all the more valuable and subjecting it to AI analysis might yield interesting insights. Now, this is merely meant as a demonstration of the technology’s capabilities, not an academic effort to extract new information on San Francisco’s urban history. Nevertheless, it delineates the kinds of exploration this technology makes possible.

“A Trip Down Market Street” is a nearly twelve-minute-long silent film, freely downloadable and streamable on YouTube. The camera, mounted on a cable car, proceeds eastward from the old US Mint on 8th Street to the Ferry Building, a distance of 1.55 miles or around 2.5 km.Footnote ⁸³ On its way, it captures a vibrant mixture of pedestrians, horse-drawn carriages, bicycles, and automobiles. Storefronts line the route with abundant awnings and business signs. The street’s human landscape is diverse – predominantly men, but also women and children. Some are conducting business; others frequent the shops. Street sweepers, policemen, and laborers appear now and again, as well as people of all social strata. In short, it is a captivating snapshot of urban life in a bustling American metropolis.

GPT-5 was prompted to provide questions that would produce interesting insights on the film, and it came up with these:

Prompt Set: Analyzing A Trip Down Market Street (1906)

Task

Analyze the 1906 film A Trip Down Market Street to extract insights on urban life, technology, and society in early twentieth-century San Francisco. Answer the following questions with evidence from the film (visual details, inferred context).

---

## 1. Urban Life and Social Texture

* What do clothing styles suggest about class differences, occupations, and gender roles?
* How do movement patterns (pedestrians, crossings, flow of traffic) reflect informal “street rules” before codified traffic laws?
* What does the coexistence of children, workers, and passersby reveal about public space inclusivity?

## 2. Technology and Modernity

* How does the coexistence of horse-drawn carriages, streetcars, and automobiles illustrate a technological transition?
* What can be inferred from the makes, models, and rarity of early automobiles about wealth and status?
* What do architectural styles and building facades along Market Street reveal about the city’s aspirations toward modernity?

## 3. Hidden Structures of Life

* Which types of labor or social groups are absent, and what does this absence reveal about visibility or marginalization?
* What do shadows, lighting, and sun angle suggest about time of day and rhythms of daily life?
* How do shopfronts, advertisements, and business signs reflect consumer culture and local economy?

## 4. Historical Imagination

* If one mapped the businesses seen on Market Street, what kind of economic ecosystem would appear?
* What implicit risks or hazards are visible in the way people navigate streetcars, wagons, and automobiles?
* What traces of migration, diversity, or racial presence – or absence – can be inferred from the visible population?

## 5. Counterfactual / Reflective Angles

* How does knowledge of the April 1906 earthquake and fire change how we interpret the film?
* Comparing this footage to later films of Market Street (post-1906 or 1920s), what shifts in pace, density, and mobility appear?
* What distinctly “San Franciscan” qualities might a contemporary visitor from New York or Chicago have noticed in this footage?
---

Output Format

* Answer each question in short analytical paragraphs.
* Use visual evidence (clothing, vehicles, signage, movement).
* When inference is speculative, mark it as such.
* Where useful, suggest further historical context (e.g., economic records, census data, transport regulations).

Of the various formats available for download in the archive.org website, I chose the H.264 format with the 2K version cropped for higher contrast. This was given to Gemini 2.5 Pro, together with the prompt. The full answer is quite lengthy and cannot be reproduced here in full.Footnote ⁸⁴ Some worthwhile snippets are as follows:

What does the coexistence of children, workers, and passersby reveal about public space inclusivity?

Market Street is portrayed as a quintessentially public and multifunctional space, shared by a diverse cross section of society. It is simultaneously a place of commerce, transit, and social interaction. The presence of working children, delivery men maneuvering heavy carts, well-dressed women shopping, and businessmen conversing on street corners shows that the street was not zoned for a single purpose. This integration of different activities and social classes in a shared physical space suggests a high degree of inclusivity, though it also created a more hazardous environment by modern standards.

Which types of labor or social groups are absent, and what does this absence reveal about visibility or marginalization?

The film’s perspective from the center of a major commercial thoroughfare creates significant absences. We do not see the industrial laborers, dockworkers, or domestic servants who formed the backbone of the city’s economy; their workplaces were in other districts or behind private doors. Most strikingly, the crowd appears overwhelmingly white. San Francisco was a major center for Chinese and other immigrant communities, yet they are invisible on Market Street. This absence points to the deep social and spatial segregation of the era. The film captures the public face of the dominant commercial culture, implicitly marginalizing the city’s diverse ethnic and laboring populations.

What do shadows, lighting, and sun angle suggest about time of day and rhythms of daily life?

The long shadows cast from right to left (west to east) at a sharp angle indicate the film was shot in the mid-to-late morning. The sun is relatively low in the eastern sky. This timing aligns with the peak of commercial activity: Deliveries are underway, shops are open, and professionals are moving about the city. It is not the early morning commute nor the late afternoon leisure hour, but the heart of the business day. The clear, bright light contributes to the film’s feeling of energy and optimism.

These three answers were then fed to GPT-5 Pro with the request to analyze them for quality. After thinking for a little over four minutes, it provided the following verdict:

Mixed. First answer: reasonable but under-evidenced. Second: insightful but overconfident. Third: incorrect on time-of-day.

(1) Inclusivity answer

* Strength: Correctly reads Market Street as shared, multiuse space.
* Issues: “Working children,” “well-dressed women shopping,” and “businessmen conversing” are asserted, not tied to frames. The camera sits mid-street, so sidewalk behaviors are hard to verify.
* Fix: Anchor to time-coded cues and visible markers. Example: Identify the women boarding and waiting at cable stops, the commuter “islands,” and the newsboys at the ending shot. Cite frames, not categories. ([The Library of Congress][1])

(2) Absences answer

* Strength: Notes missing industrial labor and domestic work from this corridor.
* Issues: Jumps from “we don’t see X” to “deep segregation.” That may be true, but the route, time, and known staging bias who appears. The producers recycled automobiles to inflate modernity, which signals scene construction and skews representation.
* Fix: State route and staging as confounders before inferring segregation. Recommend cross-checking with adjacent-district footage or records. ([The Library of Congress][1])

(3) Shadows / time-of-day

Verdict: Wrong. The Ferry Building clock shows *3:17 p.m.** near the end; Kiehn dates the run to just after 3 p.m. on Saturday, April 14, 1906. The sun would be in the southwest, not “low in the eastern sky.” Shadows should trend ENE across the frame, consistent with afternoon. ([The Library of Congress][1])

Upgrade checklist for all three

* Tie every inference to specific, verifiable visuals and timestamps.
* Separate observation from interpretation; mark speculation.
* Control for producer staging, route selection, and camera position before making social claims. ([The Library of Congress][1])

Disagreement of opinion between the models should not be seen as a weakness of the technology. If anything, it indicates their different dispositions toward social commentary.Footnote ⁸⁵ Prompting the LLM to think creatively is guaranteed to produce results we disagree with, confirming the need for historical expertise. Factual errors, on the other hand, are more concerning. Failure to determine the time of day was not the only error that appeared when researching the topic. Perplexity stated that the streetcar ride was westbound, proceeding from the Ferry Building to Van Ness Avenue. These are easily fact-checked, but other statements may not be, highlighting the constant need for vigilance.

Concluding Thoughts

The scope of inquiry into primary sources enabled by LLMs is much larger than can be explored here. Other sources like paintings, textiles, and folk music recordings could have been as beneficial to explore. The purpose of these use cases was to shed light on this flexibility and the first methodological steps that might be taken on similar paths of exploration. The recommended approach balances precision in context, task, and structure with a creative range for the LLM to suggest unexpected directions and conclusions.

The network visualizations subsection may seem like a side product of historical work rather than actual innovation. Visualization has merit because it focuses our attention on previously unrecognized aspects of the source. Understanding Alcuin’s epistolary output divided by year can be contextualized against events during those years.

Finally, the lack of consensus between models on certain questions highlights the value of historical expertise. Large Language Models benefit those who can challenge their confabulations. This is true for identifying factual and interpretative errors, where our experience surpasses the AI’s intuitions, born of ingesting vast corpora of data. However, using LLM-based assistants opens up new exploration avenues. The expert/assistant paradigm remains a useful framework for thinking about interaction with AI.

4 Governance, Ethics, and Futures

Rethinking Scholarly Transparency, Intellectual Property, and Data Security

A mainstay of academic writing is the apparatus of footnotes, bibliography, and attribution that accompanies it. From the undergraduate level onward, it is drilled into budding scholars through countless “academic writing” classes and training modules. This machinery of attribution has ballooned to Byzantine proportions in the quest of one feature – transparency. Scholars need to be able to retrace an idea to its roots: Which scholar came up with this claim or that argument, and was it understood correctly and presented contextually by the author whose work we are now reading? Any deviation from this orthodoxy can heap upon an incautious author the ire of the academic community, even allegations of plagiarism when carelessness strays into intentionality.

Generative AI changes attribution substantively. The entire process of generating, establishing, and fortifying ideas, understanding, and responding to counterarguments is now entangled in our conversations with LLMs. Assuming we want to maintain academic transparency, we need to rethink our approach to attribution. This section will discuss the challenges and opportunities of a new methodology of reference.

Universities and publishers have avoided this question by expunging LLMs from the equation. While they acknowledge that AI does much of the heavy lifting in the creative process, guidebooks and style manuals insist that the human author possesses sole agency and accountability for the finished product. This solution is appealing for several reasons. First, it is easy to enforce without having to consider the intricacies of AI-guided ideation. The human steers the machine, the logic goes, so the human is responsible. Much like a car that cannot be held to account for the actions of its driver, the burden of agency must be on the author. Anyone who has interacted with AI can sense the limitations of this analogy.

Second, it flatters our privileged notions of humanity as a uniquely creative force. We alone create. AIs regurgitate, repurpose, and rephrase, a mechanistic and pale facsimile of the divine spark of creation that only we possess. Even if we resist attributing true creativity to AI, would it be inaccurate to say that much of what we term creativity in humans is also derivative? We call it inspiration when a DJ remixes old music or a historian rearticulates an old theory, but is it that much different from what an AI does?

Another emergent challenge of increased AI usage in academia is the risk of inadvertently crossing intellectual property boundaries. AI’s conversational interface obscures the distinction between original responses and potentially copyrighted material absorbed during pretraining, making it challenging to identify when protected content is being reproduced. Using it, either by quoting directly or paraphrasing, seems harmless at first but might constitute a violation of intellectual property law. This is complicated by the notion of derivative work, like translations, which enjoy similar protections. How, then, is the user to know how to navigate this new landscape? Attribution and copyright require careful reconsideration in light of AI’s growing popularity in academic work.

Inside Out: Compliance

University and publisher requirements on scholars’ usage of AI can be thought of as a tiered system, where restrictiveness and acknowledgment maintain an inverse relationship.Footnote ⁸⁶ Lack of a well-defined policy often goes together with a restrictive posture or outright ban on AI, whereas a specific requirement for transparency presupposes the usage of generative AI tools in all areas of academic activity, seeking merely to regulate it. The policies of universities and publishers on AI fall roughly into six tiers, here termed T0 through T5. T0 describes institutions with zero AI usage policy, and T5 describes those with the most specific and demanding protocols. Collectively, the tiers run the gamut of restrictiveness and specificity regarding disclosure and freedom of usage. It is best to think of these categories as a spectrum rather than a discrete set of rigid requirements, although there are enough commonalities within and between categories to justify this division.

T1 roughly means that generative AI tools are banned for core intellectual tasks. They may be allowed to proof check spelling and grammar, with some institutions permitting the use of licensed tools for public data. T2 limits the use of AI to well-described tasks, primarily ones that help a scholar conceptualize and perform final polish. The required statement is limited to a simple acknowledgement (“I used ChatGPT to polish language”).

T3 is by far the most prevalent. Universities and publishers in this category require disclosure of the following parameters:

Tools and versions used
Tasks undertaken
Inclusion of a statement of human accountability
Mention in the methods or acknowledgements chapter.

For our purposes, the third demand is the most germane of the four. It recognizes AI’s fallibility but insists it cannot excuse flaws in the final product, placing the burden of responsibility fully on the human author. There are some variations to the language, but all such statements make clear that the author takes responsibility for the accuracy, integrity, and originality of the submitted product.Footnote ⁸⁷ Other demands constitute a protocol for interested parties to check which components of the work used AI tools.

All things considered, there is room to doubt the usefulness of this requirement package. First, we are now well past the point of blaming AI for the shortcomings of the research, and attempting this would probably harm the scholar’s reputation more than accepting the mistake and remedying it would do. Second, what would interested parties gain from knowing someone used ChatGPT-5.2 or Claude Opus 4.5? Would this shed light on the ideation process? If the exercise aims to introduce transparency into AI-assisted work, T3-style declarations fall short of the mark. They seem more interested in creating the appearance of accountability and shielding the institution from claims of complicity.

What comes in place of T3 is not obvious. Applied fully, T4 and T5 could be a path toward regulating the usage of AI in academic work but risk overcomplicating it with an unwieldy bureaucracy. In addition to previous requirements, T4 asks the author to provide:

The actual prompts used, documented in appendices
A complete record of methodology and outputs
Specific access dates and tool versions
Systems for audit and validation

T5 goes further, requiring a full record of exchanges and files attached during the interaction. Essentially, it aims to preserve a complete copy of the process by which the work came to be (see Table 2). When we consider the meaning of these requirements, we realize that it transcends the individual author. Since neither the pages of an academic article nor those of a grant proposal can support the weight of the full conversation history that led to their production, nor certainly the systems needed to audit and validate it, T4 and T5 have immediate implications for the entity making the requirement. We could imagine these conversations stored on university servers, meaning the LLM that produced them is supplied by the university as well. The same goes for any validation mechanisms. To this, one must add protocols that ensure compliance with these requirements and perhaps people or AI systems that do the checking.

Table 2AI disclosure requirements in academic institutions and publishers

Table showing six tiers of AI disclosure policies with columns for policy stance, permitted uses, and disclosure requirements.

Erecting such an edifice of academic sleuths would necessitate reengineering the entire peer review process, which is already overreliant on weak incentives and the goodwill of scholars. More to the point, the costs of building and maintaining this compute and storage architecture are prohibitive, especially to small or resource-poor institutions. Access to state-of-the-art generative AI tools and subscriptions is already unevenly distributed, a reality with dire implications for social inequality.Footnote ⁸⁸ Such draconic measures would doubtless exacerbate the problem.

If having no policy is untenable and having too much is unsustainable, the only recourse is to have a different kind of policy altogether. To put this into perspective, we should consider what treatment past technologies received when they arrived on the scene. Word processors, introduced into academic contexts in the early 1980s, were met with equal suspicion. L. Eudora Pettigrew, associate provost for instruction at the University of Delaware, had this to say about the new coin-operated word processing machines installed on campus: “Some people conceive of a computer in every pot. The computer is a supplement, a tool, not a substitute for education. To get an education, a student must think.”Footnote ⁸⁹

Nowadays, the academic community does not attempt to regulate word processor usage because it is considered a ubiquitous tool of the trade. The same is not yet true of generative AI, but it may soon become reality, mainly because it is the path of least resistance. Unburdening ourselves from the obligation to report certain attributes of AI-assisted work will undoubtedly make the review process easier. In that sense it is the most feasible direction, but is it one we can live with?

To envision what such a solution might look like, it is first necessary to distinguish between two core issues that the academic community finds objectionable about using AI for research. One is the question of factuality and hallucination – how much we can trust the answers, solutions, and methodologies to deliver the results the AI claims to deliver. The other is the question of originality – how much of the finished product is the brainchild of the human author. The first issue is a matter of consequence. If a flaw undermining the veracity of the AI’s responses has crept into its suppositions and gone undetected, this has implications for the validity of ensuing arguments and conclusions. It is critical to have a mechanism in place to inquire and root out these and future mistakes.

Originality, on the other hand, is not nearly as consequential, or more accurately, a red herring. Assigning it to the author should be automatic, barring important exceptions discussed in the copyright section. Conversations with the AI during the research stages are initiated by the author and guided by their feedback. Incorporating perspectives from these exchanges into the final article is solely the author’s decision. If we view academic writing as a broader and more cognitively demanding task than the act of putting words on a screen, the AI’s role as assistive technology becomes clearer.Footnote ⁹⁰ Consequently, expending significant effort to disentangle the human from the AI in the ideation process cannot be justified. To simultaneously insist that it cannot be a contributing author and to implicitly regard it as one by demanding authors disclose which ideas it produced is inconsistent, not to say hypocritical. Once AI-generated ideas have crept into the general ecosystem of published literature (which they undoubtedly have done by now), would the burden of identification shift to authors wishing to incorporate it into their footnotes? It is not a reasonable request to make of authors or of overburdened reviewers and has no impact on the quality of the scholarship, provided ideas are properly vetted.

The first section argued for the importance of holding on to agency and taste, which seems to fly in the face of the statements made above. In a sense, this is correct. Scholars who contract out ideation to LLMs might, in the end, be doing themselves a disservice. Be that as it may, this should not concern reviewers any more than the quest to trace the provenance of ideas expressed in scholarship to conversations the author may have had.

A solution should then focus exclusively on factuality, a formidable task in its own right. A useful analogy for such a solution is Freedom of Information legislation (FOI). The US’s 1967 Freedom of Information Act creates a legal right to access records produced and held by public entities like the government. There are exemptions, but most US federal government documentation falls under the purview of FOI.Footnote ⁹¹ This attitude is not exclusive to the US and is reflected in numerous global legislative initiatives.Footnote ⁹² Laws governing public access assume the information belongs to the public.Footnote ⁹³ They likewise assume that enabling access reduces corruption and increases accountability.Footnote ⁹⁴ Their objective is to strengthen the democratic standards of the community, whether national or international, as in the case of UN legislation. Mutatis mutandis, this can be transplanted to the field of academic regulation of AI usage in the following ways.

First and importantly, such a system would conserve the current attribution standards and would not call for any special disclosure beyond normal scholarly documentation when the article is submitted and considered by the editorial board. AI usage is taken for granted as part of the process of ideation and composition, and in this sense is uninteresting to reviewers and readers. Unlike the current system, it invokes a “triggered transparency” in special cases. Only when a reader or reviewer raises concerns about a specific claim or methodology can they request that the author reproduce pertinent transcripts or data.

Once activated, the disclosure protocol does not put an unreasonable burden on the author or the publication, and like FOI, any such demand would be time constrained. In other words, there is a reasonable timeframe for the scholarly community to demand the information and for the author to respond. Provided requests are met satisfactorily, the article receives an appropriately high trustworthiness ranking by the journal, using an agreed-upon scale devised for this verification process. If the author fails to provide the relevant data, the publication may downgrade the article’s score or retract it completely if the challenge is deemed crucial to the arguments of the article.

Before applying such a system and making new demands on authors, certain practical aspects need to be clarified. The level of adequate “record-keeping” by the author and durations for keeping old threads must be negotiated. What constitutes a reasonable request and what strays into “fishing expedition” territory must also be determined. Since the crux of the process is ensuring factuality, verification need not rely on the interactions the author had with an AI but on their ability to reproduce data supporting their claims. The methodological components would necessitate disclosure, although it reduces the number of relevant threads to a minimum. Knowing the policy in advance prepares authors to plan methodological conversations and keep them available for any future inquiries.Footnote ⁹⁵

In practical terms, authors should regard conversations with AI as records of their academic work. This demands no special measures, other than a few simple adjustments. The first is to refrain from deleting threads. OpenAI does not put a time limit on keeping a user’s threads, while Anthropic and Google do.Footnote ⁹⁶ When using the latter two, authors should keep relevant threads in projects and download them after submission. Another is to separate research-related conversations from other topics. Both are easily accomplished, and once LLM usage protocol in research solidifies, such recordkeeping standards should become routine.

This is not a cosmetic shift but a substantive one. It accepts that the AI is an integral part of every academic submission and renders discussion of “originality” immaterial to the review process. Yet it insists on transparency to curtail problems of factuality and flawed methodology, ensuring the quality of submitted scholarship and benefiting the academic community. Consequently, ensuing queries would only be entertained if they addressed these issues.

The second topic of the compliance discussion is intellectual property and copyright regulation. AI is rapidly evolving, and regulatory bodies struggle to keep up. Recent court rulings have revealed deep divisions in how legal systems interpret these fundamental questions. As of this Element’s writing, several legislative initiatives and high-profile lawsuits are expected to conclude in ways that will change the interpretation of intellectual property with regard to AI.Footnote ⁹⁷ One example of a recent decision is a landmark settlement of a class-action reached in September 2025, in which Anthropic agreed to compensate authors of books the company is alleged to have downloaded illegally to train its models. The agreement states that Anthropic will pay $3,000 to around half-a-million authors, totaling $1.5 billion, with the possibility that additional authors will join later.Footnote ⁹⁸ The case sets an important precedent for US law, but it is important to stress that efforts to address AI’s impact also vary across cultures and legal traditions. Historians writing for European, American, and Asian readerships should be aware of these differences.

Understanding the risk associated with inadvertently replicating copyrighted data calls for a deeper knowledge of how LLMs are trained. In the next subsection, we will discuss this in greater detail. For now, it is important to clarify that the inner working of neural networks – the technology upon which generative AI is based – is not transparent to researchers. This is known as the Black Box problem, which describes the inherent difficulty in retracing the process an LLM goes through from the moment it is asked something until it generates an answer.Footnote ⁹⁹ Work into this question, otherwise known as interpretability research, is conducted in numerous labs worldwide, with Anthropic leading the way among commercial labs. In fact, Anthropic’s CEO, Dario Amodei, published a column on his website titled: “The Urgency of Interpretability,” which outlines the risks of opacity and the need for a clearer understanding of how AI makes decisions.Footnote ¹⁰⁰

Since the ways an LLM arrives at the answer are often obscure, a scholar who relies on its responses may be relying on copyrighted text. Now, it is perfectly acceptable to draw inspiration from other scholarship, works of art, op-ed columns, and any other forms of human expression, copyrighted or not. Inspiration is the engine that propels human creation. The problem arises where usage goes beyond inspiration and transgresses into the realm of unlawful exploitation.

We might ask who is responsible for the offense: The author and publisher or the AI vendor whose tool “malfunctioned” by echoing protected material? As we have already seen, academic institutions and publishing bodies do not allow an author to shift accountability to an AI. The legal question is an extension of this perspective, since the law expects authors and publishers to stand by their product too. AI vendors would bear secondary liability if they knowingly contributed to the offense or were able to take steps to prevent it from happening. None of these stipulations apply here, as the Black Box problem illustrates. But even if OpenAI or Google were indirectly responsible, that does not absolve the author.

One important protection enjoyed by academic work is “fair use,” which goes into effect if several criteria are met. It is important to note that this framework is not a panacea against copyright infringement. Fair use creates exceptions where using copyrighted material is allowed, such as criticism, comment, research and scholarship, teaching, and reporting. Four criteria are considered when determining whether fair use applies, and this is done on a case-by-case basis.Footnote ¹⁰¹ Nevertheless, it is advisable to know if such protections are needed and if so, how likely they are to hold if the use is challenged.

Let us consider two scenarios where intellectual property law and copyright affect historians. Before diving in, we should define another useful concept – derivative work. According to the US Copyright Office: “A ‘derivative work’ is a work based upon one or more preexisting works, such as a translation, musical arrangement, dramatization, fictionalization, motion picture version, sound recording, art reproduction, abridgment, condensation, or any other form in which a work may be recast, transformed, or adapted.”Footnote ¹⁰² The concept is remarkably salient to historical work. Take critical editions, which, unlike translations, are not mentioned by name in the legislation. They are often based on public domain primary sources but nevertheless constitute a protected derivative work. When composing the critical apparatus, making selections between manuscripts, and opining on the likelihood of certain readings over others, the editor is making a transformative contribution and is thus creating a new work.

Legislation in EU countries has awarded protected status to critical editions. The Italian law “Edizioni critiche e scientifiche di opere di pubblico dominio” grants the editor and publisher economic rights spanning twenty years from publication.Footnote ¹⁰³ German law (“Urheberrechtsgesetz, § 70 Wissenschaftliche Ausgaben”) grants 25 years of protection to editors of scientific editions, provided they “represent the result of scientifically organised activity and differ substantially from previously known editions of the works or texts.”Footnote ¹⁰⁴ Historians using AI might inadvertently reproduce what they thought was a public domain source but is, in fact, a derivative work that made its way into the training data and was memorized.

If derivative work based on public domain texts poses a danger to historians, the situation becomes doubly challenging for secondary literature. The most likely scenario here is “the unknowing quotation,” where a historian tasks an AI with summarizing literature on a certain topic. The AI returns an answer containing a verbatim reproduction or close paraphrase of copyrighted material, like a substantial block of text from a recent monograph. Trusting the AI’s response, the historian integrates it as is into their publication, allegedly committing copyright infringement. Ignorance of the LLM’s ability to reproduce protected content does not shield the historian from liability, although it might affect damages. The court will apply several tests to determine if fair use applies, such as the transformative nature of the use, the amount of text taken, whether it constituted the core of the original creator’s argument, and the ability of the contested text to replace the original work and harm the original author. A related issue emerges when an LLM attributes text to a specific author and the historian cites it without verification. If it is hallucinated, and depending on the nature of the quote, the historian may be guilty of libel.

Remedies for this eventuality are obvious and align well with the suggestions expressed throughout the Element. When quoting, it is imperative to find the source and not to rely on AI. Prompting an LLM to reveal its sources as exhaustively as it can is also recommended, although it cannot be fully trusted. Most importantly, we should not paste generated text into our drafts. Beyond the legal entanglements that may ensue, this is never advisable. If this statement seems contrary to my previous argument on the author’s right to claim originality even when working with AI, clarification is in order. AI-assisted ideation is a perfectly acceptable way to elucidate one’s thoughts. Proofing prose with AI identifies logical flaws. Finding elegant ways to express a sentiment, rearticulating thoughts, overcoming obstacles – all are legitimate uses for generative tools. Pasting full paragraphs of unchecked AI-generated text, especially from fact-retrieval and literature-review threads, is a methodologically unsound, borderline unethical practice even if copyright is not infringed.

The second scenario, “the methodology mirror,” involves using an analytical framework from another work, scholarly or otherwise. Methods and ideas cannot be copyrighted, so adopting methodology is, in principle, allowed. Expression, on the other hand, is protected. Specific labels, charts, definitions, and diagrams fall under the purview of expression, so replicating them should be avoided. Regular academic practice allows for borrowing by requiring attribution, but there are limits. Copying diagrams and charts – in other words, protected expression – is forbidden unless permission is sought and granted. In the context of an LLM’s output, copyright infringement arises when it echoes another scholar’s protected expression. First and foremost, to offset the risk of breaching copyright, historians should disabuse themselves of the false impression that the LLM’s generation is always unique. Second, they should perform a search to determine whether any terms or assets they wish to incorporate are protected. Even if the search does not yield results, it is in their best interest to change definitions, redraw diagrams, and significantly alter other risky artifacts.

The main takeaway from these scenarios is simple: mistrust the AI. Conversations with LLMs are powerful brainstorming tools that can clarify murky topics, streamline prose, and introduce order into messy argumentation. As sources of information, they are helpful in pointing us in productive directions but cannot (yet) be relied upon to provide information that can be unproblematically integrated into publishable work. Until such mechanisms that warn us about the risk of copyright infringement are in place, we must verify whether the ideas and articulations that arise organically from conversations with LLMs are, indeed, free to use.

Outside In: Exposure

Interacting with LLMs raises concerns about the fate of the data shared. By default, many of the companies whose platforms we use harvest our conversations to train their models.Footnote ¹⁰⁵ To people who are not computer scientists, this means very little. What does “training on our data” mean, in practical terms? An enduring concern for academics is the level of exposure we accept during conversations with AI. Can our information be reproduced by users asking similar questions? Do our ideas become part of the next model’s “core knowledge”? What does opting out of training entail in terms of functionality and user experience?

To understand these risks fully, we must examine how AI training actually works. When our data contributes to the model’s training, the entire interaction, including questions, responses, files, and metadata, is used in the pretraining, fine-tuning, and reinforcement learning (whether by human feedback or other means) stages to adjust its parameters.Footnote ¹⁰⁶ Pretraining adds data to the model’s large corpus of information, improving its knowledge in relevant domains. Fine-tuning examines the user’s questions and the model’s answers to optimize instruction following. Reinforcement learning teaches the model to favor highly rated responses over poorly rated ones.

Modern LLMs have hundreds of billions, even trillions, of parameters, so each conversation contributes minimally to the model’s state. Yet paradoxically, while training data is not stored as discrete, searchable files, LLMs can still replicate exact training sequences when prompted in particular ways. This happens when specific prompts activate patterns the model memorized during training.Footnote ¹⁰⁷ It is helpful to think of LLMs as learning to structure arguments (discourse scale) and complete sentences (token scale) using different yet complementary methods. On the microlevel, next-token syntax and morphology are considered, as well as collocations, or word combinations that are more likely to appear together. On the macro level, the model considers rhetorical templates and narrative moves associated with specific genres. Micro and macro reside on different neural network levels and invoke different attention mechanisms. Micro and macro influence each other: Repetitive micro-patterns accumulate and shape macro-level formations; macro templates constrain token prediction on the microlevel. The effect of our data on the model’s parameters is diffuse and cumulative.

What does this mean for us? First, it means that unless a malicious attack is launched, the odds of the model regurgitating our ideas are minimal but not nonexistent. Such an occurrence is known as “echo,” which can refer not only to a verbatim quote (rare) but also to paraphrases and rehearsal of similar ideas (more common). There are practices we can adopt that are minimally intrusive and significantly reduce the chance of unwanted echo, even with off-the-shelf chatbots like ChatGPT, Claude, or Gemini. For more sensitive cases, there are other solutions, like restricting ourselves to open-source models installed locally. Still, most historians simply want to use popular tools and are not dealing with classified information that requires extraordinary measures, so this is where the focus of this section lies.

The easiest way to avoid training is simply to opt out through a small change in settings. In ChatGPT, going to the settings menu, selecting Data Controls, and toggling off “improve the model for everyone” stops OpenAI from training on conversations. In Claude, the same functionality is found in Settings → Privacy → Help improve Claude. This has very little downside for the user, apart from the duration threads are stored. If we still want to participate in the provider’s data training, there are additional steps to prevent the models from training on, and potentially echoing, sensitive information.

Unique word combinations and repetition are signposts picked up by the model during training.Footnote ¹⁰⁸ Because of this, it is advisable to start a new thread for every new topic instead of keeping the same one as a working draft, which is more likely to contain reiterations.Footnote ¹⁰⁹ Keeping old threads alive is unproductive for other reasons, such as exhausting the model’s context window. Another safety measure relates to the way we talk about our research with the LLM. Discussing methods and broad structures is safer than specifying micro components like labels and exact data. When necessary, it is recommended to anonymize the data using placeholders or general categories instead of specific ones. This prevents future models from echoing our exact phrasing, terminology, and claims. Latency is a contributing factor since models undergo retraining cycles every few months. If the data in question is set to be published shortly, any concern about future training is rendered irrelevant.

Recent models are becoming increasingly more agentic, meaning they use various tools, including third party ones not directly developed by or associated with the AI provider. Connectivity with Google’s suite of tools, local files and folders stored on our computer, and apps that perform browsing and scraping functions are all governed by these protocols.Footnote ¹¹⁰ This presents new data safety challenges, especially for academic tasks informed by our specific context. External tools are usually not covered by the AI provider’s commitments and might use data in ways that we are not comfortable with. It is important to realize when we use them and what risks we assume.Footnote ¹¹¹

A final point worth exploring is watermarking, where models generate text and images using a pattern, unseen by human readers but recognized by software.Footnote ¹¹² If such a pattern is indeed noticeable, it threatens to expose those who wish to keep their work with LLMs undisclosed. Users are already aware of the telltale signs of LLM usage.Footnote ¹¹³ Numerous companies offer detection services, with universities as a main client.Footnote ¹¹⁴ Reactions have been mixed, to say the least, and several institutions have discontinued their subscriptions, citing false positives, false negatives, and disproportionate flagging of non-native speakers.Footnote ¹¹⁵ Google currently offers the SynthID system, which checks for a watermark used by its generative tools.Footnote ¹¹⁶ It is, as of the writing of this Element, the only major AI lab that publicly employs such a system. While SynthID shows that watermarking can be solved, scaling its success rests on several assumptions – that the AI lab voluntarily introduces such a detection system; that artifacts are generally recognizable by the lab that offers the model, not other labs; that the generated text was not sufficiently edited by people or other LLMs, diminishing the system’s ability to verify usage.

Watermarking is more a matter of regulation than technology. Given enough time, labs will probably solve the problem of recognizing AI-generated artifacts. The question is – will they be incentivized to make their solution publicly accessible? For now, two takeaways of the issue are that academics should not regard present-day services as reliable enough to render definitive verdicts on AI usage and should consider whether the harms of using them outweigh the benefits. Second, pasting text directly from LLMs is never recommended, not only to evade watermarking but for many other reasons explained throughout this Element.

Concluding Thoughts

This section addresses three issues that impact historians’ work with AI: attribution and transparency, copyright and intellectual property, and data leakage risks. Each poses different challenges to our methodologies and requires different remedies to mitigate its most serious effects.

Universities and publishers are scrambling to formulate a comprehensive response to generative AI, whose capabilities call into question practices that have become ingrained in academia’s self-perception. Tracing an idea’s lineage is a venerable tradition among academics, historians notwithstanding. Concern with tracing chains of transmission is a core component of Islamic jurisprudence (known as isnād) and in contemporary scholarly tradition can be traced to Nietzsche’s On the Genealogy of Morality (1887), which posits that values originate in a cultural context worth interrogating: “ we need a critique of moral values, the value of these values should itself, for once, be examined – and so we need to know about the conditions and circumstances under which the values grew up, developed and changed.”Footnote ¹¹⁷

Michel Foucault, another staple of intellectual history bibliographies, contributed to Nietzsche’s ideas by adding his own: Genealogy does not oppose itself to history as the lofty and profound gaze of the philosopher might compare to the mole-like perspective of the scholar; on the contrary, it rejects the meta-historical deployment of ideal significations and indefinite teleologies. It opposes itself to the search for “origins.”Footnote ¹¹⁸

Much like Foucault’s “entangled and confused parchments,” contemporary historians’ work becomes inseparable from their interaction with AI. Demanding fealty to human exclusivity of ideation and agency does little to advance the overarching mission of Begriffsgeschicte.Footnote ¹¹⁹ This section argues that the idea of attribution requires reconceptualization. Any new framework should assume human authors steer AI with their taste and should be considered the originators, by default, of ensuing scholarship.

Memorization, or the tendency of LLMs to repeat verbatim or closely reproduce protected or sensitive text, poses two distinct but interrelated challenges. The first is copyright, where the historian worries about incorporating someone else’s IP into their scholarship. The second is data leakage, where they worry their work unintentionally finds its way into the model’s training data and is revealed to the academic community prematurely. For the former, the solution is fortunately not very different than what was accepted wisdom pre-AI: checking and rechecking sources, not trusting second-hand quotes, and creating unique taxonomies. For the latter, the solution involves anonymization and coded language. Overall, these are not monumental asks compared to the clear advantages of working with LLMs. Historians would be wise to adopt them.

Conclusions

From Techniques to Judgment: The Development of Historical Wisdom

This Element approached the usage of AI in historical scholarship from several different angles. It opened with a broad overview of the challenges generative tools pose to the discipline’s traditional methodologies, arguing that after the technical aspects of the work are automated away, what remains is agency and taste. These are core competencies of historians, which will endure despite AI advancements.

Why should this be so? Generative tools are becoming very proficient at tasks. So much so, in fact, that they will soon overtake us in many of them. Benchmark saturation, whereby tests are devised for LLMs only to be conquered soon thereafter, is now a common phenomenon.Footnote ¹²⁰ During the writing of this Element, several generations of models came and went. GPT-5.2, Gemini 3 Flash, and Claude Opus 4.5, currently the leading models, will probably be outdated by the time it reaches the reader. While capabilities will continue to improve, this does not chip away at the essential premise of history as an academic discipline; quite the opposite.

Historians have many tasks in front of them, but being a successful scholar demands more than crossing them off a list. An academic position is a network of interdependent tasks, whose orchestration and prioritization are complex skills. Tasks are nodes, and much like an LLM’s neural net, the strength of the relationship between them is affected by countless considerations – submission deadlines, human relations, institutional pressure, teaching schedules, and so on.

Reducing the job to a list of tasks is to misunderstand it entirely.Footnote ¹²¹ Even if we discount the pedagogical and administrative components of academic jobs, research calls for aptitudes that are currently far beyond the capabilities of the strongest AI. We should stop worrying and look to it to help us think through problems, overcome tedium, and become more innovative and effective researchers. How to benefit from the technology’s advantages and evade its pitfalls was the subject of this Element.

As argued in the first section, this translates into spheres where historians can exercise agency and taste. Put differently, these two are the first and last mile of the journey. AI might be the vehicle that speeds it up and removes obstacles from the path, but the decision to depart and the wisdom to know we have arrived is ours alone. One early promise of AI has been that it will spare us blank page paralysis.Footnote ¹²² This seems unhelpful given all that has been said thus far. The struggle of figuring out which questions to study is the essence of historical ideation. Farming this out to LLMs is counterproductive and “pollutes” our context with unwanted ideas.

The second section focused on context management. Narrowly considered, context is the actionable information provided to an LLM to constrain its activity to semantic fields that overlap with our research objectives. This is a very partial picture. In its fullest sense, context lives between us and our “cognitive community.” This term signifies subjects and objects outside ourselves, like friends and colleagues with whom we have conversations about research topics, our libraries and knowledge repositories, and, yes, our AI assistants. It is a dynamic cloud of associations, forming and reforming through ideation and articulation of research directions. We must curate context diligently, for the LLM and ourselves. That is why the first steps in the process of formulating a research question and deciding on an approach should not be tainted by preliminary chats where we prompt an LLM for ideas.

Chatbots are improving at simulating understanding. Personalization menus, system prompts, and project memory create the illusion that the model has a digital avatar of us and our interests in mind. Again, this is an incomplete picture. As every creative writer knows, ideas do not come from rehashing past threads; they are triggered through embodied and sensory experience – hearing an anecdote, seeing an image, stubbing a toe. So far, none of this is captured in our chats with LLMs. This is not to say that they cannot be helpful. The concept of “context dumping” was introduced in the second section to enlist them to introduce order into our inchoate ideas. Germination happens with us; nurture and guidance can enlist generative technologies.

The remainder of the journey is where AI truly shines, as demonstrated in Section 3. Code, the essential building block of computerized processes, is being abstracted away in favor of natural language. Anthropic briefly introduced the “imagine with Claude” interface that allows users to build apps without ever needing to see a single line of code.Footnote ¹²³ It is one product in an evermore crowded ecosystem. Scholars need to devote more attention to the directions we would like to take these newly found capabilities. The traditional outlets of historiographical creativity – namely, the article and the monograph – must relinquish their stranglehold, making room for new forms of academic publication. Google’s new Learn Your Way experiment, which transforms traditional textbooks into interactive, personalized learning experiences adjusted to the literacy of the specific user, is an illustration of this trend.Footnote ¹²⁴ Why should we continue to consume scholarship using media modalities conceptualized during the printing press revolution? With these tools at our disposal, historians can build virtual museum exhibitions, simulations of past (or counterfactual) events, and zoom in or out of historical settings with relative ease. The workflows demonstrated in Section 3 are initial steps toward a more engaging way of doing, teaching, and popularizing history.

Despite optimistic forecasts, many problems persist. Section 4 focused on the ethical and legal hurdles that continue to preoccupy academia as it grapples with generative technology. The prospect of unwittingly providing our scholarship as training data or violating intellectual property law was discussed, alongside possible remedies and best practices for preventing these scenarios. As the regulatory climate continues to evolve, we can expect a consensus on the defining questions to emerge. Other unforeseen issues will undoubtedly persist, challenging legal and ethical experts.

At the core of the section stood the question of originality. Adhering to the approach that sees agency and taste as human prerogatives renders much of the current attribution discussion moot. Yet AI will likely continue its advance, birthing new forms and questions around authorship. If it ever evolves into a moral subject, we will be forced to revisit these from the position of a partner, not a user. Strange times ahead.

This Element concludes, as it began, with a call to action. Historians must face this technology and its complexities head-on. Only an informed, critical, and experienced academic community can hope to influence the trajectory of what appears, from the present vantage point, to be one of the most profound technological and cultural revolutions of the modern age. Let us employ our agency and discernment to meet the challenge.

Historical Theory and Practice

Daniel Woolf
Queen’s University, Ontario
Daniel Woolf is Professor of History at Queen’s University, where he served for ten years as Principal and Vice-Chancellor, and has held academic appointments at a number of Canadian universities. He is the author or editor of several books and articles on the history of historical thought and writing, and on early modern British intellectual history, including most recently A Concise History of History (CUP 2019). He is a Fellow of the Royal Historical Society, the Royal Society of Canada, and the Society of Antiquaries of London. He is married with 3 adult children.

Editorial Board

Dipesh Chakrabarty, University of Chicago
Marnie Hughes-Warrington, Adelaide University
Ludmilla Jordanova, University of Durham
Angela McCarthy, University of Otago
María Inés Mudrovcic, Universidad Nacional de Comahue
Herman Paul, Leiden University
Stefan Tanaka, University of California, San Diego
Richard Ashby Wilson, University of Connecticut

About the Series

Cambridge Elements in Historical Theory and Practice is a series intended for a wide range of students, scholars, and others whose interests involve engagement with the past. Topics include the theoretical, ethical, and philosophical issues involved in doing history, the interconnections between history and other disciplines and questions of method, and the application of historical knowledge to contemporary global and social issues such as climate change, reconciliation and justice, heritage, and identity politics.

Element contents

Using Generative AI in Historical Practice

Summary

Keywords

Information

Introduction

Artificial Intelligence and Historical Scholarship

A Note on Terminology and Bibliography

Structure of the Element

1 New Tools, Old Questions

Mapping the Territory of AI in Historical Practice

What’s Really at Stake

Professional Stakes

Disciplinary Stakes

Intellectual Stakes

The Irreducible Core of Historical Practice

Concluding Thoughts

2 Refining Historical Inquiry Through Dialogue

How Should We Approach These Tools?

AI as Assistant

Context

Task

Task-Oriented Prompt

Central Thesis

Historiographic Contribution

Comprehensive Research Prompt

Structure

Concluding Thoughts

3 Contextual Search and Conceptual Exploration

Semantic Search

GOAL

Networks

Other Modalities

Prompt Set: Analyzing A Trip Down Market Street (1906)

Task

What does the coexistence of children, workers, and passersby reveal about public space inclusivity?

Which types of labor or social groups are absent, and what does this absence reveal about visibility or marginalization?

What do shadows, lighting, and sun angle suggest about time of day and rhythms of daily life?

Concluding Thoughts

4 Governance, Ethics, and Futures

Rethinking Scholarly Transparency, Intellectual Property, and Data Security

Inside Out: Compliance

Outside In: Exposure

Concluding Thoughts

Conclusions

From Techniques to Judgment: The Development of Historical Wisdom

Footnotes

References

Accessibility standard: WCAG 2.1 AA

Why this information is here

Accessibility Information

Content Navigation

Reading Order & Textual Equivalents

Visual Accessibility

Structural and Technical Features

Save element to Kindle

Save element to Dropbox

Save element to Google Drive