Hostname: page-component-5db58dd55d-bthnr Total loading time: 0 Render date: 2026-06-04T06:36:07.745Z Has data issue: false hasContentIssue false

Domain anchorage in LLMs: Lexicon profiling and unintended information leakage

Published online by Cambridge University Press:  27 October 2025

Lekha Challappa
Affiliation:
Goizueta Business School, Emory University , Atlanta, GA, USA
Zijin Zhang
Affiliation:
Goizueta Business School, Emory University , Atlanta, GA, USA
Rajiv Garg*
Affiliation:
Goizueta Business School, Emory University , Atlanta, GA, USA
*
Corresponding author: Rajiv Garg; Email: rajiv.garg@emory.edu

Abstract

This study investigates unintended information flow in large language models (LLMs) by proposing a computational linguistic framework for detecting and analyzing domain anchorage. Domain anchorage is a phenomenon potentially caused by in-context learning or latent “cache” retention of prior inputs, which enables language models to infer and reinforce shared latent concepts across interactions, leading to uniformity in responses that can persist across distinct users or prompts. Using GPT-4 as a case study, our framework systematically quantifies the lexical, syntactic, semantic, and positional similarities between inputs and outputs to detect these domain anchorage effects. We introduce a structured methodology to evaluate the associated risks and highlight the need for robust mitigation strategies. By leveraging domain-aware analysis, this work provides a scalable framework for monitoring information persistence in LLMs, which can inform enterprise guardrails to ensure response consistency, privacy, and safety in real-world deployments.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Figure 1. Flow of lexicon profiles A and B through LLM.

Figure 1

Figure 2. Vector representation of response similarity.

Figure 2

Table 1. Domain-specific primes for lexicon profiles A and B

Figure 3

Table 2. GPT-4 client parameters

Figure 4

Figure 3. Sequential pairwise response similarity results across primed prompts.

Figure 5

Table 3. Domain-specific response similarity between Lexicon profiles A and B

Figure 6

Figure 4. LLM deployment strategy matrix by sensitivity level and access type.

Submit a response

Comments

No Comments have been published for this article.