Skip to main content Accessibility help
×
Hostname: page-component-cb9f654ff-p5m67 Total loading time: 0 Render date: 2025-08-22T19:36:15.579Z Has data issue: false hasContentIssue false

5 - Methods for Identifying Paradata for Data Reuse

Published online by Cambridge University Press:  05 August 2025

Isto Huvila
Affiliation:
Uppsala Universitet, Sweden
Lisa Andersson
Affiliation:
Uppsala Universitet, Sweden
Zanna Friberg
Affiliation:
Uppsala Universitet, Sweden
Ying-Hsang Liu
Affiliation:
Uppsala Universitet, Sweden
Olle Sköld
Affiliation:
Uppsala Universitet, Sweden

Summary

This chapter introduces a selection of methods applicable for identifying and extracting paradata from existing datasets and data documentation which can then be used to complement existing formal documentation of practices and processes. Data reuse, in its multiple forms, enables researchers to build upon the foundations laid by previous studies. Retrospective methods for eliciting paradata, including qualitative and quantitative backtracking and data forensics, provide means to get insights into past research practices and processes for data-driven analysis. The methods discussed in this chapter enhance understanding of data-related practices and processes, reproducibility of findings by facilitating the replication and verification of results through data reuse. Key references and further reading are provided after each method description.

Information

Type
Chapter
Information
Paradata
Documenting Data Creation, Curation and Use
, pp. 116 - 150
Publisher: Cambridge University Press
Print publication year: 2025
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This content is Open Access and distributed under the terms of the Creative Commons Attribution licence CC-BY-NC-ND 4.0 https://creativecommons.org/cclicenses/

5 Methods for Identifying Paradata for Data Reuse

5.1 Introduction

Identifying appropriate methods to pinpoint potential paradata in existing datasets and data documentation is essential for data reuse. These methods work as a complement to existing formal documentation of practices and processes that, as discussed earlier in this book, are never fully complete. Data reuse refers to secondary data analysis and use in which researchers or other stakeholders use the data collected by others to address new research questions or for other novel purposes.

Data reusers often aggregate multiple existing datasets to address broader questions. They can also approach previously collected data from a new perspective in an attempt to solve problems other than those previously addressed. While many data reusers are researchers, data is also reused for education, societal decision-making and development of new products and services. Additionally, data reuse is crucial for reproducing earlier research and the validation of its results.

Data reuse, in its various forms – including secondary data analysis, meta-analysis, and validation – plays an important role in advancing scientific knowledge, particularly in data-driven research. While explicit ‘reuse of data’ is less common outside of this paradigm, it can be broadly understood as a reuse of earlier collected resources. This includes the use and analysis of public documents, archival records and material from cultural collections. Data reuse enables researchers to build upon the foundations laid by previous studies, optimising resources and avoiding duplication of efforts (Faniel et al., Reference Faniel, Frank and Yakel2019; Gregory et al., Reference Gregory, Groth, Scharnhorst and Wyatt2020; Liu et al., Reference Liu, Wu, Power and Burton2023). Further, data reuse enhances methodological transparency by allowing researchers to examine and understand past research practices and processes, thus ensuring the validity and reliability of research findings across studies (Edwards et al., Reference Edwards, Goodwin, O’Connor and Phoenix2017; Huvila and Sinnamon, Reference Huvila and Sinnamon2022), and enhancing the reproducibility of findings by facilitating the replication and verification of results (Deeks et al., Reference Deeks, Higgins, Altman and Group2023).

Previous research suggests that one of the important factors affecting data reuse behaviour is the availability of contextual information about the data, including data description, data attributes and documentation of research methods (Faniel et al. Reference Faniel, Frank and Yakel2019; Gregory and Koesten, Reference Gregory and Koesten2022; Murillo, Reference Murillo2022). This applies to all data reuse, independent of field (e.g., Faniel et al., Reference Faniel, Frank and Yakel2019, Pickering, Reference Pickering1995, Rheinberger, Reference Rheinberger2023; Zimmerman, Reference Zimmerman2008). Paradata in particular is a key facet of contextual information because it documents the practices and processes relating to the creation, management and use of the data.

Paradata, despite its critical role in data reuse, is frequently not explicitly documented or structured as such. As discussed in Chapters 2 and 3, this type of information is often interwoven with various forms of primary and secondary research documentation and embedded within the research data itself. Moreover, the perspectives of data creators and reusers may differ regarding what specific information is critical for understanding practices and processes (Huvila et al. Reference Huvila, Andersson and Sköld2025). Consequently, the most important paradata from the reusers’ perspective does not necessarily find its way into the formal description of a particular procedure. Therefore, even when creators, managers and previous users do their best to provide comprehensive documentation of how they worked with a particular dataset, data reusers often need to seek additional information. To mitigate the risk of misinterpreting data, data reusers also need to be adept at identifying paradata, and to be able to grasp and mobilise the resources required to access and utilise it (cf. Chapter 3). This applies not only to researchers but to everyone working with data.

A recent analysis conducted in the CAPTURE project by Juneström and Huvila recognised several retrospective methods for identifying and using paradata in support of data reuse. These methods are concerned with identifying chains of activities described in the data, analysing data with qualitative and quantitative approaches to discern practices and processes used to produce and process the data, and assessing the trustworthiness of digital records to ensure their authenticity.

The methods introduced in this chapter aim to support researchers interested in secondary data analysis guidance in identifying and extracting paradata from datasets and secondary documentation. These methods are examples of approaches that can be applied to identifying and extracting paradata where it does not exist as formal ‘core paradata’ but can be derived from other information, discussed later in this volume as potential paradata (see Chapter 6).

5.2 Methods Descriptions

The methods described in this chapter were chosen based on a scoping review of paradata-related practices in research activities from various disciplines. A preliminary framework of paradata generation developed at the beginning of the CAPTURE project formed a baseline for identifying methods (Huvila, Reference Huvila and Sinnamon2022). It was complemented by reviewing a large number of articles sourced from the project team members throughout the first four years of the project. Additional texts were identified in the reference lists of the material uncovered during the reviewing process, with the focus being to include relevant complementary and contrasting descriptions of the methods and how they have been used in practice. Major categories of methods (qualitative and quantitative backtracking, data forensics and diplomatics) for post hoc identification of paradata were developed through an iterative reviewing process. This was used to develop an understanding of how documentation and paradata can be identified in different settings and how different approaches might be applicable for identifying different types of information relevant to understanding data creation-, management- and use-related practices and processes.

The methods selected for this chapter include approaches that are relatively broad and thus potentially applicable across disciplines. Some techniques specific to particular disciplines and study contexts are briefly described to exemplify an approach with potential wider relevance but are otherwise omitted in the present chapter to keep its focus on general principles and widely applicable approaches. Disciplinary specificity does not, however, always mean that a method had no wider relevance. Some of the approaches stemming from specific disciplinary contexts, such as natural language processing for the quantitative processing of textual material in the health domain, have clear potential for guiding paradata practices far beyond their original context.

In the following, three categories of methods are introduced and discussed: 1) qualitative and 2) quantitative methods of backtracking, as well as 3) data forensics and diplomatics.

5.2.1 Qualitative Backtracking

Qualitative backtracking refers to a category of qualitative methods of analysing data for discerning practices and processes used to produce and process the data. Broadly, qualitative backtracking qualifies as an umbrella term to describe the use of any conceivable form of qualitative data analysis to identify and create paradata. There are, however, certain methods that are specifically focused on practices and processes rather than creating new knowledge on, for example, objects and their attributes.

Close Reading and Thematic Analysis

The CAPTURE project has conducted a series of qualitative studies to understand where and what types of paradata can already be found in diverse data-related artefacts and datasets (see also Chapter 3). A major difficulty of generating paradata ‘by extraction’ (Börjesson et al., Reference Börjesson, Sköld, Friberg, Löwenborg, Pálsson and Huvila2022) is that datasets and accompanying documentation are often geared towards primary analysis and knowledge-making rather than secondary analysis or aggregation. This means that a lot of paradata is scattered around research documentation and formal documentation in metadata, readme and field documentation files is sparse or sometimes non-existent.

While an ideal approach for extracting as much paradata as possible would be to conduct a comprehensive walkthrough of all data and documentation, it is not always possible (Börjesson et al., Reference Börjesson, Sköld, Friberg, Löwenborg, Pálsson and Huvila2022). In such cases, it is reasonable to focus on artefacts with the greatest likelihood of containing relevant practice or process information. Such pieces of documentation could extend from datasets (Börjesson et al. Reference Börjesson, Sköld, Friberg, Löwenborg, Pálsson and Huvila2022) to research reports (Huvila et al., Reference Huvila, Sköld and Börjesson2021b), citations (Huvila et al., Reference Huvila2022), instruction manuals and handbook literature (Huvila and Sköld, Reference Huvila and Sköld2023).

A qualitative analysis based on iterative close reading (DuBois, Reference DuBois, Lentricchia and DuBois2003) of an archaeological fieldwork dataset conducted by Börjesson et al. (Reference Börjesson, Sköld, Friberg, Löwenborg, Pálsson and Huvila2022) showed that a structured datafile, especially if it is not heavily cleaned of all anomalies and preliminary observations and interpretations can provide a lot of information on how it was created and processed. The approach is based on careful analysis of the data from paradata perspective, that is, keeping in mind that all can eventually be informative of practices and processes relating to data, marking such information in the dataset, iteratively developing a structured understanding of them, and finally visualising them in diagrams or narratives. The study showed that conducting the analysis requires understanding of both knowledge organisation (how databases and metadata schemas work, and how people generally use them) and subject expertise (in this particular study, of archaeological fieldwork). Both are needed to understand where and how paradata can eventually be found and extracted and to comprehend what information qualifies as paradata and what eventual limitations they are likely to be. After the analysis, the authors found that an additional step, reaching out to original data creators to verify interpretations and filling in gaps is highly desirable, if possible. At the same time, the work also clearly showed that a dataset itself can contain a lot of information to an extent which allows the reader to gain a reasonably good understanding of its earlier life.

In other studies within the CAPTURE project, the same general approach of close reading and iterative coding combined with variants of thematic analysis inspired by the constant comparative method were applied to other research artefacts. These included research reports and instruction manuals that prescribe data generation practices and processes. The analysis started with repeated iterative reading of material, the generation of categories from the material, coding the material according to these categories, writing summaries, and developing narrative descriptions of the identified themes.

The categorisation was informed by (research) questions underpinning the analysis. For example, in a study of what paradata could be extracted from archaeological field reports (Huvila et al., Reference Huvila, Sköld and Börjesson2021b), the categories related to different types of information (including narrative descriptions of practices and processes, photographs, information sources) proved potentially relevant as paradata. In the study that focused on the analysis of a dataset (Börjesson et al. Reference Börjesson, Sköld, Friberg, Löwenborg, Pálsson and Huvila2022), the categories typified different types of paradata (including knowledge organisation and presentation paradata).

The general approach is applicable also to close reading of diagrams, drawings and photographs (Huvila et al., Reference Huvila and Sköld2023). An overall observation of this work is that while data, secondary research documentation and diverse artefacts that are used in data creation, management and use – including data management infrastructures (Börjesson, Reference Börjesson2021) – contain a lot of traces of practices and processes (cf. Chapter 3) that makes it possible to extract a lot of paradata. It requires a lot of work and the varying quality and level of detail between different artefacts affects considerably the effort of backtracking paradata. One dataset and research report might contain a lot of extractable paradata while others can be relatively spartan and too ‘cleaned’ to reveal much about what happened even if analysed in detail. Another limitation of the approach is that the generated understanding of practices and processes stemming from analysis of heterogeneous data are as diverse as the data itself. Different accounts can also be difficult to compare and they do not necessarily provide systematic enough descriptions for stepwise reproduction of practices or processes.

Key References and Further Reading

  • Börjesson, L., Sköld, O., Friberg, Z., Löwenborg, D., Pálsson, G., and Huvila, I. (Reference Börjesson, Sköld, Friberg, Löwenborg, Pálsson and Huvila2022). Re-purposing excavation database content as paradata: An explorative analysis of paradata identification challenges and opportunities. KULA: Knowledge Creation, Dissemination, and Preservation Studies, 6(3), 1–18. The article describes a study of an archaeological fieldwork dataset and discusses the opportunities and limitations of generating paradata ‘by extraction’ from research data.

  • Rainey J., Macfarlane S., Puussaar A., Vlachokyriakos V., Burrows R., Smeddinck J. D., Briggs P. and Montague K. (2022) Exploring the role of paradata in digitally supported qualitative co-research. In CHI Conference on Human Factors in Computing Systems. New York: ACM, 1–16. https://doi.org/10.1145/3491102.3502103. This article illustrates how coding processes of qualitative data can be studied using thematic analysis.

Narrative Inquiry and Object Biography

Narrative inquiry is a type of qualitative analysis method that uses stories to describe and understand human action (Polkinghorne, Reference Polkinghorne1995). Narrative inquiry is different from other forms of narrative analysis in that it focuses on identifying or constructing narratives for analytical purposes instead of analysing existing narratives, for example, diverse types of stories found in the literature or narrated orally (Sharp et al., Reference Sharp, Bye, Cusick and Liamputtong2018). Its focus on human action makes it apposite for qualitative backtracking of paradata. Polkinghorne (Reference Polkinghorne1995) notes that ‘narrative is the type of discourse composition that draws together diverse events, happenings, and actions of human lives into thematically unified goal-directed processes’ (p.5). For narrative inquiry, actions, events and happenings form the building blocks from which narratives are generated and that make the individual steps of activities become meaningful.

Phoenix et al. (Reference Phoenix, Boddy, Edwards, Elliott, Edwards, Goodwin, O’Connor and Phoenix2017) have used narrative analysis to investigate marginal comments written on paper questionnaires (i.e. marginalia) to understand the practices of interviewers and their struggle with the multiplicity of possible interpretations of the data they generate, their obligations to senior researchers, and their own emotions regarding the interview process, the participants, and their role in the research project they were involved in.

Carpentieri et al.’s (Reference Carpentieri, Carter and Jeppesen2023) narrative analysis of the open-ended questions from the first British Birth Cohort Study aimed at reusing existing data to study social mobility in post-war Britain. At the same time, their study also shows how narrative analysis and the construction of ‘pen portraits’ of individual study participants also produced new knowledge on the data collection processes in a cohort study (Carpentieri et al., Reference Carpentieri, Carter and Jeppesen2023). Gaps, anomalies and trends in data creation are sometimes difficult to discern unless individual pieces of information are put together in an attempt to form a coherent whole. These two examples illustrate how identifying paradata linked to survey studies can be repurposed to address research questions not initially proposed by the original dataset, providing insights into the interaction between data creators and study participants.

Object biography is a method that has affinities with narrative inquiry in how it can improve understanding of dynamic relations between people and artefacts. The idea of writing life stories of objects in the manner of biographies of human beings was introduced by Kopytoff in Reference Kopytoff and Appadurai1986 (Kopytoff, Reference Kopytoff and Appadurai1986). The approach has become popular especially in material culture studies and archaeology in the analysis of a large variety of different types of smaller and larger artefacts (Joy, Reference Joy2009). Friberg and Huvila’s (Reference Friberg and Huvila2019) object biographical study of an archaeological collection showcases how the approach can be applied to assemblages, and larger and more heterogeneous artefacts than individual material objects. While Joy (Reference Joy2009) speaks for keeping biographical analysis focused on individual objects, the key question is rather to define what is an object, the unit of analysis, than to limit inquiry on individual physical things. The use of the metaphorical notion of biography has also faced critique. An alternative metaphor of itinerary has been suggested as a possible more neutral substitute to a biography. Biographies have been criticised for a risk of leading to think of non-human matters as if they were human-beings. Biography also comes with a strong connotation that a trajectory is historical and not only has a beginning but also an ending, which seldom is fully applicable to material objects or in the context of paradata, for practices or processes (Bauer, Reference Bauer2019; Fontijn, Reference Fontijn, Hahn and Weiss2013).

Object biography has obvious affinities with other biographical approaches to research, including the chaîne opératoire discussed later in this chapter. Another related technique is life history research that has tended to focus on both spatially and temporally larger scale interactions relating to technology and material objects (Joy Reference Joy2009). Object biography and its underpinning concept of biography, by contrast, is premised by the idea of idiosyncracy and uniqueness of every individual lifestory (Dannehl, Reference Dannehl2017).

Narratives on the other hand, open up more explicitly for their a priori multiplicity (Schofield et al., Reference Salinas, Penafiel, McCormack and Morstatter2020). In contrast to narrative inquiry that focuses on narrativising human action, the common denominator of biographical approaches is the relationship between people and objects (Gosden and Marshall, Reference Gosden and Marshall1999). Their common feature is a parallel focus on change that brings practices and processes into the frame. A major limitation with narrative inquiry and object biography is that there are not always enough ingredients available to construct complete narratives.

As with close reading and thematic analysis, narrative inquiry and biographical research are time-consuming. At the same time, however, their advantage lies in how they help to weave people and artefacts together and through narratives verbalise their intermingling across time. Object biographies can be compared to identify norms and standard procedures (cf. Joy Reference Joy2009), as well as to describe the variety of practices and processes in a given context. A parallel benefit emphasised both in narrative inquiries of survey data and object biographies is how the very act of trying to construct a narrative reveals absences, invisibilities and breaks in what is known about practices and processes. A limitation of narrative inquiry and biographical approaches is that even if the narratives would be well grounded in the available evidence, they are subjective. Also, while narratives are useful for conveying an understanding of a particular practice or process for a human-being, they are difficult for computers limiting their usability as paradata in computational analysis and replication of practices and processes.

Key References and Further Reading

  • Bauer, A. A. (Reference Bauer2019). Itinerant objects. Annual Review of Anthropology, 48(1), 335–352. A review of recent theoretical discussion relating to object biographies and itineraries.

  • Dannehl, K. (Reference Dannehl2017). Object biographies: From production to consumption. In History and Material Culture, 2nd ed, Routledge. The book chapter compares the object biography method with the life cycle model providing useful insights to inform the choice of specific methods for life historical inquiry.

  • Edwards R. (2017) Working with Paradata, Marginalia and Fieldnotes: The Centrality of By-products of Social Research. Edward Elgar Publishing. The edited volume contains multiple chapters that illustrate not only how to analyse paradata, marginalia and fieldnotes in social science research through case studies but also provides insights into how the underpinning research processes can be backtracked in datasets and research documentation.

  • Phoenix A., Boddy J., Edwards R. and Elliott H. (Reference Phoenix, Boddy, Edwards, Elliott, Edwards, Goodwin, O’Connor and Phoenix2017) ‘Another long and involved story’: Narrative themes in the marginalia of the Poverty in the UK survey. In Edwards R., Goodwin J., O’Connor H., and Phoenix A. (eds.), Working with Paradata, Marginalia and Fieldnotes. Edward Elgar Publishing. The book chapter exemplifies how narrative inquiry can be used to analyse marginal notes in research documentation.

Chaîne Opératoire

As well as methods focused on proximally close analysis – literally close reading – of data, there are multiple approaches applicable to qualitative backtracking of paradata in research materials that focus on larger scales of inquiry. Chaîne opératoire (operational chain or sequence) is ‘a method of documenting technical activities in the field’ (Coupaye, Reference Coupaye, Bruun, Wahlberg, Douglas-Jones, Hasse, Hoeyer, Kristensen and Winthereik2022, p. 45, emphasis in original) developed and extensively used in archaeology and anthropology (Audouze and Karlin, Reference Arshad, Jantan and Abiodun2017). Its focus on explicating social practices and technical processes, especially chains of producing, using and discarding of artefacts has obvious affinities with the ambitions of generating paradata.

Coupaye (Reference Coupaye, Bruun, Wahlberg, Douglas-Jones, Hasse, Hoeyer, Kristensen and Winthereik2022) illustrates the use of chaîne opératoire as a descriptive and interpretive tool to analyse and make visible the dynamics, elements and levels of detail in technical activities. He exemplifies the use of chaîne opératoire by contrasting the operational sequences of his morning activities and yam cultivation in Papua New Guinea showcasing the versatility of the approach to represent both contemporary and past practices of different, both large (agriculture) and small (morning routines) scales. Chaînes opératoire are typically visualised using flow diagrams to depict the sequential and structural dimensions of the portrayed activities (Figure 5.1 for an example). The level of detail and steps included in individual sequences vary and as Coupaye (Reference Coupaye, Bruun, Wahlberg, Douglas-Jones, Hasse, Hoeyer, Kristensen and Winthereik2022) notes, a specific chaîne opératoire is only a ‘skeleton key’ (p. 54) that cannot possibly incorporate everything about a specific process.

A flow diagram illustrates data collection, analysis, and archiving, along with their flow and actors. See long description.

Figure 5.1 A simple chaîne opératoire representing a data collection, research and data archiving process with major operations and actors represented.

Figure 5.1Long description

The diagram starts with the formulation of a research question by a researcher, leading to the planning a study by them. This splits into directing data collection by the researcher and collecting data by a technician. Both these lead to analysing data by a data analyst, which is then divided into reporting results by the researcher and archiving data by a data archivist.

Instead of being complete representations, they are rather ‘recordings of particular itineraries’ as observed by particular individuals (p. 54). Rösch’s (Reference Rösch2021) analysis of an archaeological excavation process and Opgenhaffen’s (Reference Opgenhaffen2022) extensive work on analysing, modelling and documenting artefact production (Opgenhaffen, Reference Opgenhaffen2022) together with Coupaye’s (Reference Coupaye, Bruun, Wahlberg, Douglas-Jones, Hasse, Hoeyer, Kristensen and Winthereik2022) illustrative example of using chaîne opératoire in a contemporary everyday life context provide useful examples and templates for applying the concept for extracting and structuring information on past activities also in domains outside of archaeology.

The retroactive process modelling of constructing chaînes opératoire has obvious similarities with the prospective design of workflows (Chapter 4) but also fundamental differences. The gaze backwards and (re)construction of a past process on the basis of its diverse material and immaterial traces calls for particular caution in determining what steps to include in and exclude from the operational chain, and what remains invisible between them. Chaînes opératoire are not visible in the wild. They must be recognised as analytical constructs. Similarly for those working with data creation, managing and using it, hardly consider their undertakings being composed of a series of steps but rather to form a flow of practice.

From the perspective of qualitative backtracking, the method is primarily one of articulating and structuring observations of the key steps in a process, rather than modelling an operational chain as a whole. An operational chain should not be mixed with a recipe or a step-wise procedural code that allows rerunning a specific process. However, in spite of the evident incompleteness of the paradata that can find its way into a chaîne opératoire, it can still be highly useful in making elements of practices and steps of processes visible and by facilitating their critical reflection on them (Coupaye, Reference Coupaye, Bruun, Wahlberg, Douglas-Jones, Hasse, Hoeyer, Kristensen and Winthereik2022). What needs to be kept in mind is that the shape of every individual operational chain is dependent on what is observed and what questions guide the identification of its steps and of the sequence as a whole.

Key References and Further Reading

  • Brysbaert A. (Reference Brysbaert2012) People and their things: Integrating archaeological theory into prehistoric Aegean museum displays. In Narrating Objects, Collecting Stories. Routledge. This book chapter describes the use of the concept of the chaîne opératoire together with the notion of cross-craft interaction to provide insights in how people interact with material objects in the museum context.

  • Coupaye L. (Reference Coupaye, Bruun, Wahlberg, Douglas-Jones, Hasse, Hoeyer, Kristensen and Winthereik2022) Making ‘Technology’ visible: Technical activities and the Chaîne Opératoire. In Bruun M. H., Wahlberg A., Douglas-Jones R., Hasse C., Hoeyer K., Kristensen D. B., and Winthereik B. R. (eds.), Palgrave Handbook of the Anthropology of Technology. Basingstoke: Palgrave Macmillan, 37–60. A book chapter that provides an approachable introduction to the chaîne opératoire method.

  • Rösch F. (Reference Rösch2021) From drawing into digital: On the transformation of knowledge production in postexcavation processing. Open Archaeology 7(1), 1506–1528. https://doi.org/10.1515/opar-2020-0211. This article demonstrates how Chaînes Opératoire can be utilised to trace the steps of data transformation and interpretation in the context of archaeological work.

Conversation Analysis

In contrast to most of the methods discussed in this chapter so far, there are also more structured and formal approaches applicable to identifying and extracting paradata. Conversation Analysis (CA) is an approach for studying social interaction that focuses on the details of action. It originates from the work of Sacks, Schegloff and Jefferson in the 1960s on studying casual conversations in everyday-life situations (Goodwin and Heritage, Reference Goodwin and Heritage1990). As a naturalistic approach to study social interactions, CA focuses on analysing naturally occurring activities as they unfold in human interactions by recording and analysing actual situated activities (Mondada, Reference Mondada2012). In CA, recordings of naturally occurring activities, such as telephone calls, family dinner talks and doctor–patient communication are intensively analysed to shed light on the social rules underpinning communication.

Based on this premise, within CA several comprehensive data transcription schemes have been developed to capture the details of social interaction including nuances of speech, turn-taking and non-verbal actions. The following short conversation exemplifies some features of the popular Jefferson Transcription System (Jefferson, Reference Jefferson and Lerner2004):

A. Which one of the spectrometers did he use to take the measurements?

B. Did use what?

A. SPECTROMETERS?

B. I don’t know, really. We probably have to check_

A. I’ll go and see if there’s something in the notebook_

B. O::k:, sounds good. We’ll probably have to talk about it at the meeting tomorrow morning,

A. Alright (.) I can write it down on the agenda

To point attention to some of the features of the transcription system, CAPITAL LETTERS signify loudly spoken passages, _ unchanging pitch, comma (,) a slightly rising pitch, colons (:) prolonged sounds, and a full stop in parentheses (.) a noticeable pause.

Conversation analysts emphasise not only the content of what is said but also the manner in which it is said, including the visible verbal and non-verbal behaviours of the participants, such as the temporal and sequential relationships and aspects of speech delivery like changes in pitch, loudness and tempo (Hepburn and Bolden Reference Hepburn and Bolden2012). As such, CA research requires a deep engagement with recorded data, highlighting the importance of the researcher’s participation in the manual transcription process and the close integration of transcription and analysis (Bolden, Reference Bolden2015). Nonetheless, the interactional details captured by CA transcription practices rely on the overhearer’s perspective to piece together a plausible version of the participants’ actual experiences (ten Have Reference Have2002).

As a qualitative backtracking approach, CA is a highly specific method for understanding practices and processes in naturalistic settings. When analysing the recorded data and transcripts, CA does not assume that such aspects of context as social categories (race, gender, power, class, etc.) have inherent relevance (Joyce et al., Reference Joyce, Douglass, Benwell, Rhys, Parry, Simmons and Kerrison2023). The starting point is to record the naturally occurring activities based on the specific aspects of practices or processes of interest and what in the analysed material indicates specific types of social interactions. For example, conversation analysts have examined turn-taking as a fundamental structure in everyday conversations, together with adjacency pairs as a basic element of sequence organisation. Adjacency pairs are sets of actions where if one speaker performs an initial action of a certain type, the recipient is expected to respond with a corresponding action (Drew Reference Drew2004). The analysis of the organisation of sequences in conversations can facilitate our understanding of the social rules enacted in specific contexts of everyday interactions. As conversation analysts turn to the study of talks in specific institutional context, also known as institutional talk, one of the key objectives is to inquire into ‘what kinds of institutional practices, actions, stances, ideologies and identities are being enacted in the talk, and to what ends?’ (Heritage, Reference Heritage2004, p. 109).

When approached as a form of qualitative trace analysis, CA can be especially useful in identifying conversational practices related to data creation, management and use. The approach could be especially fruitful in the analysis of diverse types of recordings of practices and processes, including video and audio transcripts. Earlier CA-based studies have also examined, for example, responses in survey studies from conversational perspective. CA could be similarly utilised, for example, to analyse interviewers’ conversational practices, or standardisation and deviations from standardised ‘talk’ with a database schema when data is input in a database system. Arminen and colleagues have investigated the practices of enacting and utilising practical know-how and institutionalised expertise in social interactions (Arminen, Reference Arminen2017; Arminen and Simonen Reference Arminen and Simonen2021) showing an example that could be transposed to studying how they play out in data-related practices and processes.

Overall, CA provides a potentially powerful lens to understanding the specifics of human interactions in data creation and reuse processes and practices by meticulously analysing moment-to-moment interactions in conversations. Its apparent drawback is that it requires detailed recordings of actual conversations which are not always available. Another downside is that it is like the most qualitative methods, very time-consuming. CA comes also with specific theoretical and practical commitments that are different from many other approaches developed for analysing discourse and conversations (Ten Have, Reference Have2006). Such include its focus on individual conversations can limit its applicability to identify and account for the impact of broader societal discourses and sociocultural underpinnings of the conversations. It is also empathetically data-driven in a minute detail and its interest lies in explicating interaction and how it is organised rather than what drives the described practices or processes. However, as such it can – as the brief example above demonstrates – help to track minute details of practices and processes, and how they are talked about.

Key References and Further Reading

5.2.2 Quantitative Backtracking

Quantitative backtracking refers to methods that can be used for quantitative analysis of data and diverse forms of secondary documentation and evidence for extracting paradata. Similarly to qualitative backtracking, we use the concept to describe a broad variety of approaches that apply quantitative analysis to identify, summarise and excerpt paradata, including both statistical and machine learning techniques.

Quantitative Trace Analysis

A variety of quantitative methods can be used to analyse datasets for identifying patterns in how they have come into being. The work of Börjesson et al. (Reference Börjesson, Sköld, Friberg, Löwenborg, Pálsson and Huvila2022) on close reading an archaeological fieldwork dataset provides multiple cues to how this can be done in practice with structured data. For example, using information of the point of time when specific data points have been entered in a database, it is possible to create a sequence of actions of how a dataset came into being. Depending on the routines of database creators, the sequence is likely to relate in one way or another to the procedures of creating, managing and using the data.

In cases where the data is entered directly at the moment of creation (cf. Huvila, Reference Huvila2012), the sequence extractable from databases corresponds well with the work process. However, in other cases when, for example data entry is done in batches after a certain amount of time has passed or a particular time of the day or week, the sequence is a less accurate and detailed representation of the data practice. In addition to identifying temporal sequences of actions, it is possible to identify patterns and follow changes in how vocabulary, descriptors or documentation of measurements evolve from the inception of a dataset to the point when it is finalised. If a dataset incorporates fields for preliminary and final interpretations like the one analysed by Börjesson et al. (Reference Börjesson, Sköld, Friberg, Löwenborg, Pálsson and Huvila2022), it is possible to trace the progress in the work of interpreting data points either as an individual or collaborative undertaking.

The surge of digital data collection both in social research and sciences has brought new opportunities to collect and backtrack trace data. Many devices used by data creators log a lot of data that can be purposefully collected as paradata as discussed in Chapter 4. However, as exemplified in Chapter 3, in cases where they have not been collected purposefully and remain as residues rather than as a part of formal documentation, they still provide opportunities for post hoc analyses and paradata generation.

Many digital cameras stamp photographs not only with information on the device and its technical characteristics and calibration but also the time when a photograph was taken and also the geographical coordinates of the place where the photo was taken. This information can be used for reconstructing the spatial and temporal sequences of data generation. Comparable information can also be extracted from other measurement devices, including 3D laser scanners and various types of laboratory equipment, and collected separately using a GPS device.

The automatic generation of potential paradata applies also to many software packages used in data collection and analysis. In social science survey research such trace data is explicitly termed paradata and occasionally collected and preserved on purpose for future use. Computer-assisted survey software programs collect a lot of secondary documentation, a part of which qualifies as formal, intentionally generated and collected paradata and is termed as such in the context of survey research (Durrant and Kreuter, Reference Durrant and Kreuter2013). Other parts may require more processing to explicate the understanding of data creation and management procedures. The latter applies especially to data embedded in the survey data itself.

In interview research, interviewers’ movements in the field collected using a GPS device have been used to analyse to what degree sampling protocols have been followed and what has happened during the data collection process (Choumert-Nkolo et al., Reference Choumert-Nkolo, Cust and Taylor2019). The analysis of response time from web-based surveys has been similarly used as an indication of whether respondents have difficulties understanding individual questions or to indicate the amount of effort they invest (Kunz et al., Reference Kunz, Daikeler and Ackermann-Piek2024). Much of the focus of the auxiliary data captured from survey and interview studies has been on enhancing the validity of collected data by addressing the issues of non-response biases in data sampling. However, diverse forms of (potential) paradata can also have many other uses in informing the understanding of data collection procedures. They can also facilitate the study of participant and researcher behaviour for greater understanding and more diverse transparency of data creation, management and use practices and processes.

Depending on the desired granularity and type of insights in practices and processes, different quantitative analysis methods can be applicable for making sense of quantitative trace data. Trend analyses, regression and straightforward correlation analyses can offer valuable insights into trace data, such as changes in vocabulary or patterns of documenting data points. Examining the terms used to describe observations or the frequency of measurements taken at different times during a field study can help identify changes in data collection practices and the types of decisions made throughout the process.

Visualisation of spatial or temporal movements and changes can also be helpful in the interpretation of trace data. To this end Ekström (Reference Ekström2022) developed a tailored application to visualise the spatial and temporal aspects of citizen scientists’ data collection practices on a map. In another study, Pentland et al. (Reference Pentland, Recker, Wolf and Wyner2020) developed a method for extracting contextual information from the trace data of the audit trails of electronic medical records and used the open-source ThreadNet tool to visualise the process data. Visualisation can make especially quantitative paradata easier to understand, reveal patterns and help to obtain an overview of larger sets of traces.

Meta-analysis provides another potentially useful method for trace analysis, specifically as a framework for comparative analysis of practices or processes. It has been extensively used in the reuse of clinical trials data and involves defining the criteria for including studies, searching for and selecting studies, collecting data about a study (e.g., details of methods, participants, setting, context, interventions, outcomes and results), extracting data from reports, and then statistically combining findings from multiple distinct studies (Deeks et al., Reference Deeks, Higgins, Altman and Group2023). Techniques used in meta-analysis can be helpful in aggregating paradata from parallel practices and processes for comparison and broader understanding of wider constellations of how data is created, managed and used.

There are also other methods for revisiting and assessing earlier data that can be applied for quantitative backtracking. Evidence review, including integrity checks, data extraction, transformation and sense-making, has been used to identify the common outcome measures in data harmonisation (the process of integrating data from different sources for comparative purposes) tasks (Deeks et al., Reference Deeks, Higgins, Altman and Group2023; Liu et al., Reference Liu, Wu, Power and Burton2023). For example, similar to how Goldsmith and colleagues reviewed patient-reported and expert-identified scales of measuring pain using evidence mapping, identifying research gaps and multiple challenges in synthesising data (Goldsmith et al., Reference Goldsmith, Taylor, Greer, Murdoch, MacDonald, McKenzie, Rosebush and Wilt2018), the method can be utilised in comparing and synthesising parallel sets of quantitative traces of practices and processes.

Despite the increasing interest in collecting and analysing additional documentation on survey procedures, acquiring and integrating such data presents challenges (Sakshaug and Struminskaya, Reference Sakshaug and Struminskaya2023), many of which are also relevant to other contexts and traces of data creation, management and use. Diverse behavioural cues, including temporal sequences and movements, are not always straightforward to link to a particular practice or process. Their meaning and implications to generated data can be difficult to interpret especially in secondary data analysis when the data was collected by other researchers.

In spite of the downsides of increased normalisation of paradata discussed earlier in this chapter, quantitative trace analysis would undoubtedly benefit from increased standardisation of trace data. Doing so might be feasible in such contexts as structured survey research but less so in other contexts of data generation, which lack standardised procedures and shared data structures. This applies to many branches of qualitative research but also elsewhere in domains where data practices are highly contextual and difficult to standardise due to local circumstances.

Many of the apparent problems and limitations can be mitigated by adjusting the procedures of how trace data is sourced following appropriate sampling strategies. Crucial steps to this direction is to try to ensure that the trace data is covering the relevant participants and aspects of the practice of interest. For example, if there are traces of the decisions made by only one member of a research team, the understanding of the work of the team as a whole remains limited. Some potential problems can also be managed by selecting robust analysis methods that work for the specific types of trace data with the potential to shed light on the specific data creation, management and use procedures in hand. To this end there are a plethora of statistical methods that help to mitigate problems, for example, with skewed distribution of samples and missing data points. The key point is that identifying and selecting trace data, as well as determining workable approaches, remains complex. This process requires a combination of methodological and domain expertise, which is essential for the successful quantitative analysis of trace data.

Key References and Further Reading

  • Deeks J. J., Higgins J. P., Altman D. G. and Group CSM (Reference Deeks, Higgins, Altman and Group2023) Chapter 10: Analysing data and undertaking meta-analyses. Cochrane Handbook for Systematic Reviews of Interventions. https://training.cochrane.org/handbook. An authoritative handbook that introduces the principles and methods of conducting meta-analysis in healthcare.

  • Kocar S. and Biddle N. (2023) The power of online panel paradata to predict unit nonresponse and voluntary attrition in a longitudinal design. Quality & Quantity 57(2), 1055–1078. https://doi.org/10.1007/s11135-022-01385-x. This journal article demonstrates how to analyse trace data for identifying the predictors of panel participation in survey research.

  • Venturini T., Bounegru L., Gray J. and Rogers R. (Reference Venturini, Bounegru, Gray and Rogers2018) A reality check(list) for digital methods. New Media & Society 20(11), 4195–4217. https://doi.org/10.1177/1461444818769236. This journal article reviews conundrums relating to the use of online trace data for the analysis of collective action and provides a checklist of major issues to take into consideration.

Natural Language Processing

As a method of quantitative backtracking, natural language processing (NLP) techniques can be used to identify the process and practice information in human language data. NLP is a field of research that focuses on computational analysis and manipulation of human language.

Different types of NLP techniques exist. Symbolic NLP is based on processing textual or speech data using a set of rules. For example, an example of a rules-based approach to identify paradata in a research report is to generate a list of conditions where a particular phrase is interpreted as a description of a process. If the phrase ‘was measured’ appears in the section ‘Methods’ in a research report, it is considered as paradata on research data creation whereas, if the same sentence appears in the historical background it is supposed to be relating to a historical practice.

Statistical NLP is based on finding patterns in large masses of text. In the previous example, a statistical NLP approach could be used to find patterns in how methods sections in research reports are written and by searching for similar patterns in other texts, to figure out whether they contain methods descriptions, or at least passages that remind of methods descriptions.

Since the early 2000s, NLP has increasingly been based on the use of neural networks. In contrast to rules-based systems and statistical NLP that need to be trained by the researcher, neural networks can be trained automatically to learn features of human language provided large enough quantities of text are available as input. More recently, large language models (LLMs) have been applied to the task of process and procedure extraction for business process models beyond the existing rule-based approaches (e.g. Bellan et al., Reference Bellan, Dragoni and Ghidini2024; Neuberger et al., Reference Neuberger, Ackermann, Jablonski, Sellami, Vidal, van Dongen, Gaaloul and Panetto2024). Nonetheless, to enhance the transparency and fairness of the developed systems, it is necessary to address language biases in the internal knowledge of LLMs (Salinas et al., Reference Li, Higgins and Deeks2023), as these biases can impact downstream applications, such as the task of process and procedure extraction for paradata generation.

Several commonly used NLP techniques are relevant to the extraction of process and procedure information. For example, Named Entity Recognition (NER) can be employed to detect all instances of the named entities (such as persons, organisations, locations, dates and times, and events) in the text as part of the information extraction task (Bird et al., Reference Bird, Klein and Loper2009). Identifying, linking and tracing, for example, persons or organisations, in language data can provide insights into practices and processes they have been engaged in. Tracking dates, times, and events can help (re)construct temporal sequences and spatial locations.

Relation Extraction (RE) can be employed to identify and classify the relationships between entities within a text. In the sentence ‘The committee approved the proposal,’ Relation Extraction identifies the entities ‘committee’ and ‘proposal’ with the relationship ‘approved’. In addition, Entity Resolution (ER) is able to identify semantically equivalent entities that refer to the same information object across different data sources. For instance, ER can identify that ‘IBM’ and ‘International Business Machines Corporation’ refer to the same company across different documents. To address the problem of rule-based method optimised for a specific domain, one approach involves extracting text and location of process elements (NER), resolving them into collections of unique entities (ER), and extracting entity arguments and relation types (RE) for extracting business process information (Neuberger et al., Reference Neuberger, Ackermann, Jablonski, Sellami, Vidal, van Dongen, Gaaloul and Panetto2024). Extracting process elements and their interrelations can facilitate the automated generation of research process model.

For the application of NLP to paradata identification, a paradata extraction approach that involves close iterative reading can be assisted by NLP techniques (Börjesson et al. Reference Börjesson, Sköld, Friberg, Löwenborg, Pálsson and Huvila2022). CAPTURE project tested this in a short pilot project with promising results (Huvila et al. Reference Huvila2022). To promote the semantic integration of datasets, different textual patterns for temporal expressions in archaeological datasets have been identified as part of a standardisation process, noting the importance of keeping the original context (and provenance) of the dating information (Binding and Tudhope Reference Binding and Tudhope2023). As an application of NLP techniques to assess document similarity, Sakahira et al. (Reference Sakahira, Yamaguchi and Terano2023) analysed excavation report texts concerning buried cultural artefacts, demonstrating that the similarities of texts based on sentence embedding (transforming sentences into numerical vectors) of excavation reports can reflect the similarities among archaeological sites. However, the application of NLP techniques to large amounts of data, involving various steps of data standardisation and processing, requires a high level of technical skill.

For developing applications that perform simple NLP tasks, such as counting the number of words, creating a list of words, tracking the word position, and counting word frequencies in a text, it is possible to use an out-of-the-box toolkit. NLP Toolkit is an example of a popular programming library (www.nltk.org) that can be used directly to perform simple NLP work and to develop one’s own complex NLP applications using the Python programming language (Bird et al., Reference Bird, Klein and Loper2009). Many other toolkits exist for multiple programming languages and platforms for developing NLP applications.

Clinical Trial Risk Tool (https://app.clinicaltrialrisk.org) exemplifies how NLP toolkits can be used to develop user-friendly tools to extract process information from textual data. It takes as an input a clinical trial protocol in PDF format, extracts information on the key facets of the reported trial, compares it to quality norms, and produces a report with an assessment of the risk that the reported trial is uninformative.

The major drawback of NLP approaches to quantitative backtracking is that an NLP algorithm never understands its input as a human-being. The generated outputs are potential paradata rather than a definite list of all relevant information. Both false positives and false negatives results pose a risk, making the results only useful as an initial step towards more in-depth analysis.

Another obstacle to scaling up the application is the scarcity of extensive training datasets for extracting process information (Bellan et al. Reference Bellan, Dragoni and Ghidini2024; Neuberger et al., Reference Neuberger, Ackermann, Jablonski, Sellami, Vidal, van Dongen, Gaaloul and Panetto2024). In spite of the shortcomings, NLP techniques can provide strong support for identifying potential paradata in text corpora that would otherwise be impractical to analyse by hand. Moreover, when combined with related techniques for analysing, for example, static and moving images (object analysis), NLP based backtracking can be extended from text and speech to trace data in other media formats and their combinations.

Key References and Further Reading

  • Bach R. L., Kern C., Bonnay D. and Kalaora L. (Reference Bach, Kern, Bonnay and Kalaora2022) Understanding political news media consumption with digital trace data and natural language processing. Journal of the Royal Statistical Society Series A: Statistics in Society 185(Supplement_2), S246–S269. https://doi.org/10.1111/rssa.12846. An article that exemplifies how the NLP and statistical techniques can be used to elicit information on new media consumption practices from web browsing data.

  • Bird S., Klein E. and Loper E. (Reference Bird, Klein and Loper2009) Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media. An approachable book length hands-on introduction to NLP techniques, with detailed documentation available online at www.nltk.org.

  • Neuberger J., Ackermann L. and Jablonski S. (Reference Neuberger, Ackermann, Jablonski, Sellami, Vidal, van Dongen, Gaaloul and Panetto2024) Beyond rule-based named entity recognition and relation extraction for process model generation from natural language text. In Sellami M., Vidal M.-E., van Dongen B., Gaaloul W., and Panetto H. (eds.), Cooperative Information Systems. Cham: Springer Nature Switzerland, 179–197. https://doi.org/10.1007/978-3-031-46846-9_10. This conference paper proposes an approach to process information extraction combining the tasks of named entity recognition, entity resolution and relation extraction.

Data Forensics and Diplomatics

In addition to discrete qualitative and quantitative data analysis methods that can be applied for backtracking practices and processes in primary and secondary material, there are also broader methodological frameworks developed for inquiring into data and its contexts, including paradata. In this section we briefly discuss the two parallel approaches of data forensics and diplomatics. They represent two distinct but prospectively complementary approaches to analysing documents and their characteristics (Duranti, Reference Duranti2009a). The focus of both forensic and diplomatic analysis is on assessing the authenticity, reliability and completeness of the records, and their ‘ability to proof facts at issue’ (Duranti, Reference Duranti2009a, p. 64), albeit from two different methodological outsets.

Data forensics refers to the analysis of digital data and how it is created and used (Pandey et al., Reference Pandey, Husain and Khan2020. It is sometimes categorised as a branch of digital forensics (sometimes computer forensics), that is, the forensic study of digital information and records. Its roots are in the study of digital information to support investigations of crimes committed with the help of computers (Pollitt, Reference Pollitt, Chow and Shenoi2010). Much of the work in this area focuses on analysing digital data in legal contexts (Arshad et al., Reference Arshad, Jantan and Abiodun2018), for example, collecting electronic evidence to support criminal investigations and law enforcement. It is also guided by principles derived from forensic science, including the crucial importance of not relying on a single source of evidence and corroborating and consolidating findings from multiple sources (Ries, Reference Ries2018). Sub-branches of data forensics focus on forensic data analysis in specific contexts. For example, educational data forensics investigates what can be termed as potential paradata on test takers’ response data to detect indications of test fraud (De Klerk et al., Reference De Klerk, Van Noord, Van Ommering, Veldkamp and Sluijter2019).

Forensic techniques can be used in diverse digital contexts. Forensic analysis of media content shared via social media or web platforms has been used for verifying sources and integrity of media on social networks by analysing platform origins for shared content, and assessing the credibility of digital objects consisting of both text and audiovisual media (Pasquini et al., Reference Pasquini2021). Content sharing on social networks leaves digital traces that enable the identification of processing platforms, reconstruction of sharing history, and extraction of upload system details (Pasquini et al., Reference Pasquini2021), that is, information that effectively functions as paradata. Hodges (Reference Hodges2021) demonstrates in a study of biomedical device maintenance work how forensic analysis can ‘constitute a valuable approach to recovering knowledge about behaviors that have already taken place, or that have taken place in contexts where efforts at observation could encounter problems related to access, intellectual property, privacy, or safety’ (Hodges, Reference Hodges2021, p. 1404).

Many tasks in digital forensics are based on the use of technical methods for recovering and scientific and computational, often quantitative, approaches to analysing data. The forensic analysis procedure consists of identifying and recovering digital evidence, prioritising the most promising data for closer inspection, analysis and finally evaluation and interpretation of the findings (Duranti, Reference Duranti2009a). Computational analysis can help especially in forensic analyses of large data resources in the context of what has been termed ‘big data forensics’ (Zawoad et al., Reference Zawoad and Hasan2015).

Hodges’ (Reference Hodges2021) work and forensic analyses in media studies (e.g., Kirschenbaum, Reference Kirschenbaum2008; Reference Kirschenbaum2014; Ries, Reference Ries2018) and digital preservation exemplify how forensics also can benefit from the use of qualitative methods, including what can be described as the close reading of data files. Hodges applies an analysis method that draws on digital forensics and trace ethnography, a method developed for identifying and tracing actors and events that often remain invisible in digital data (Geiger, Reference Geiger2016; Geiger and Ribes, Reference Geiger, Ribes and Sprague2011). The approach follows the ethnographic logic of developing rich descriptions of activities, not necessarily by participating in them in the same physical location but rather through being present in the networks where activities take place, gathering and analysing documentary evidence.

The earlier use of qualitative forensic analysis exemplifies the use of the approach. Geiger and Ribes’s (Reference Geiger, Ribes and Sprague2011) study illustrates how the method can be used to inquire into the practices of vandals on Wikipedia by tracing their activities on the Mediawiki software platform running the encyclopaedia and external software tools. Hodges (Reference Hodges2021) analyses traces of labour in a corpus of repair manual files in PDF format. While only a handful of analysed files contain formal metadata, the manuals contain handwritten page numbers indicating their users’ need to refer and go back to specific pages in the document, evidence that that they have been from non-digital originals, wear of original documents before their digitisation, diverse marginalia (including underlining and circling of content), and added pages. All such traces evince of how the documents have been managed and used during their lifetime.

A typical problem for data forensics is that data is stored in a datafile of a format that is unknown or there are no readily available tools for opening them. Ries’ (Reference Ries2018) study exemplifies how all, even unknown, types of binary data files (i.e. files coded not in plain text) can be read for close analysis and compared using generic hex-editors (a type of file editors capable of showing the contents of binary files). In the legal domain, data forensics is also complicated by diverse anti-forensic measures used by criminals to hinder forensic analyses.

Diplomatics is a methodology that was developed in the seventeenth century for verifying the authenticity of current and archival records (Duranti, Reference Duranti, Ambrosio, Barret and Vogeler2014). Classic diplomatics is based heavily on the analysis of the physical characteristics of records, that is the form and format of documents written on, for example, parchment or paper. It aims to shed light into the contexts and reasons of record creation, persons and other agents involved in the process and the relation of records to other documents. Diplomatics of digital records or digital diplomatics refers to applying the approach in the digital realm utilising and benefitting from methods developed within digital forensic practice (Duranti, Reference Duranti2009a).

Duranti (Reference Duranti2009a) has proposed that an amalgam of digital diplomatics and digital forensics could be termed digital record forensics. Contrary to the diplomatic analysis of human-readable aspects of physical documents, digital record forensics and computational digital forensics involve analysing trace data in machine-readable formats and documenting both the output data and the derivation method (Niu Reference Niu2013).

A digital diplomatic analysis starts with description of the digital environment where the analysed data exist, their digital and logical structure and form. Applying the methodology for extracting paradata does not necessarily require that the analysed data or documents fulfil all the criteria of formal (digital) records (including identifiable context of creation, originator, action, links to other records, fixed form, stable content). However, many of these details are clearly informative of practices and processes relating to the record and, as such, are useful as paradata. Moreover, the focus of diplomatics on establishing the trustworthiness and authenticity of the evidence can provide direction to the work of identifying and extracting paradata. It allows the researcher to consider whether and to what extent the extracted paradata is authentic and trustworthy enough for the planned purposes.

The major difference between data forensics and diplomatics is in their underpinnings. Diplomatics builds on a long tradition of historical and linguistic research whereas many forensic techniques build on methods borrowed from sciences, medicine and engineering (Duranti, Reference Duranti2009a). Conducting digital forensic analysis requires some degree of technical skills whereas diplomatics requires in-depth understanding of the analysed materials, their context and mechanisms of creation and diplomatic criticism, a related method to historical sources criticism.

A comprehensive forensic or diplomatic analysis can be time consuming compared to many other approaches to paradata extraction. The focus of both data forensics and diplomatics is to assess the trustworthiness of digital records and ensure their authenticity rather than producing complete accounts of any particular practices or processes. Both methodologies, alone and combined, do provide, however, a practical framework to guide paradata extraction. Diplomatics offers a model and guidance to identifying how documents and records relate to their originating practices and processes whereas forensics offers a systematic framework for the technical work of identifying and recovering, prioritising and analysing, and evaluating and interpreting evidence.

Key References and Further Reading

  • Duranti, L. (Reference Duranti2009a). From digital diplomatics to digital records forensics. Archivaria, 68, 39–66. An approachable introduction to diplomatics, digital forensics and digital records forensics.

  • Hamouda H. A. (Reference Hamouda2023) Authenticating citizen journalism videos by incorporating the view of archival diplomatics into the verification processes of open-source investigations (OSINT). In 2023 IEEE International Conference on Big Data (BigData). Sorrento, Italy: IEEE, 2036–2046. https://doi.org/10.1109/BigData59044.2023.10386935. This conference paper demonstrates how archival diplomatics can be applied to the analysis of citizen journalism videos and their authenticity through explicating their processual underpinnings.

  • Pasquini C., Amerini I. and Boato G. (Reference Pasquini2021) Media forensics on social media platforms: A survey. EURASIP Journal on Information Security 2021(1), 4. https://doi.org/10.1186/s13635-021-00117-2. This journal article provides an extensive review of digital forensic methods for analysing media content shared via social networks.

  • Rogers R. (Reference Rogers2023) Tracker analysis: Detection techniques for data journalism research. In Doing Digital Methods, 2nd ed. SAGE, 239–258. This book chapter introduces digital forensics techniques for the media and social research projects, applicable as guidance for paradata extraction.

5.3 Discussion

Many types of methods can be useful for extracting paradata retrospectively from secondary information relating to practices and processes, although the data itself does not necessarily qualify as paradata. The approaches differ in the level of detail of the analysis, in their aims regarding what types of information and insights are produced and how practices and (or) processes are represented. They also have diverging theoretical underpinnings. Some, including formal metadata, are based on objectivist representation practices and processes whereas others, like close reading, are firmly based on interpretivist theorising.

The key practical difference between qualitative and quantitative approaches lies in their respective focus on close interpretative in-depth analysis of typically relatively small quantities of information and focus on developing explanations or predictions based on the analysis of relatively large amounts of data. Both general approaches require time and effort but the craft-like nature of qualitative analysis means that it does not scale as well as quantitative methods.

This means in practice that qualitative methods work better when the aim is to develop an in-depth understanding of particular practices or processes using a finite amount of material. Quantitative analysis is better suited for identifying broader patterns of activity based on larger quantities of data. This is not, however, the only difference between many of the methods discussed above and others applicable for extracting paradata.

The epistemological and ontological underpinnings of the approach used have implications as to what kind of information the method generates, and correspondingly, how the identified activity stands out, for example, as a practice, process, sequence of steps or flow of action. For example, using chaîne opératoire to understand practices or processes frames them as operational sequences with the ontological consequence that the described activity essentially becomes a sequence of discrete steps. Narrative inquiry leads to a very different outcome where a practice or process is both framed as and turned into a story.

Qualitative backtracking methods can be useful for discerning practices and processes used to produce and process data. One of the key steps in secondary data analysis involves data interpretation based on the contextual information about the data. There are guidelines available for writing and analysing fieldnotes in ethnographic studies (Copland, Reference Copland, Phakiti, De Costa, Plonsky and Starfield2018; Emerson et al., Reference Emerson, Fretz and Shaw2011) that are useful for extracting information on both the practices and processes of generating the notes and those described in them.

In contrast, despite the long interest in paradata, there is still a lack of established traditions and consistent approaches in the social sciences for analysing comparative information relating to survey data (Goodwin et al., Reference Goodwin, O’Connor, Phoenix, Edwards, Edwards, Goodwin, O’Connor and Phoenix2017). A part of the differences may be traced back to the epistemological debate around the relationship between the researcher and the data in survey research, and whether or to what degree the research process and data are separable from each other (Joyce et al., Reference Joyce, Douglass, Benwell, Rhys, Parry, Simmons and Kerrison2023). Specifically, since fieldnotes and findings are considered inseparable from observational process in ethnographic studies, ethnographic documentation and approaches to tracing practices and processes are based on the tenet that that documentation incorporates rich evidence of multiple, situational realities of fieldwork (cf. Emerson et al., Reference Emerson, Fretz and Shaw2011).

On the contrary, various branches of research, including survey studies, often treat research findings and evidence of the research process as distinct entities – a perspective frequently criticised by constructivist researchers and theorists. In such quantitative studies identifying and analysing evidence linked to rather than embedded in findings can comparably enrich the understanding of the research process (e.g. Fahmy and Bell, Reference Fahmy, Bell, Edwards, Goodwin, O’Connor and Phoenix2017; Phoenix et al., Reference Phoenix, Boddy, Edwards, Elliott, Edwards, Goodwin, O’Connor and Phoenix2017). Such differences underline the importance of reflecting on one’s own epistemological position and the significance of choosing and using different paradata creation methods in alignment with each other.

Pairing methods is also possible. Using a combination of methods can help to generate more comprehensive information on practices and processes. For instance, trace ethnography (as was discussed briefly in conjunction with data forensics and diplomatics) combines participant-observation with the analysis of extensive data found in computer logs to reconstruct user patterns and practices within online communities (Geiger and Ribes, Reference Geiger, Ribes and Sprague2011). A combination of computational analysis of digital traces in online ethnographic research and ethnographic observation can help to provide a more nuanced understanding of the investigated community (Barkhatova, Reference Barkhatova2023). Pairing methods can provide a more comprehensive and nuanced understanding of practices and processes.

Further, some of the prospective and in-situ methods of paradata generation discussed in Chapter 4 can be applied also to retrospective data on practices and processes. The presence of core paradata (cf. Chapter 6) is helpful not only for directly conveying an adequate understanding of practices and processes for data reuse but also as a starting point for closer examination of secondary sources. With a rudimentary core paradata in place, it becomes easier to start knitting diverse forms of secondary descriptions and traces together to form a richer account of how a dataset was created, and how it is managed and used. Moreover, it can also help to assess eventual constraints for secondary use of data as informative of practices and processes (Johns et al., Reference Johns, Meurers, Wirth, Haber, Müller, Halilovic, Balzer and Prasser2023).

While formal metadata, data modelling and ontologies typically are used prospectively to prescribe data generation, they can also be used retroactively. The work of Thomer and colleagues on geobiology fieldwork (Thomer et al., Reference Thomer, Wickett, Baker, Fouke and Palmer2018), discussed in Chapter 4 demonstrates some of these possibilities.

It is also possible to combine different prospective and retrospective methods. For example, the CIDOC CRM (formal ontology for the documentation of cultural heritage), PROV-DM (provenance standard for specific domain), and named graphs can be employed in combination to represent of objects and their related practices and processes (Shoilee et al., 2023).

All methods discussed in this chapter aim at what Lund (Reference Lund2024) calls a diachronic analysis of materials with a potential to function or be appropriated (cf. Chapter 7) as paradata. They aim at explicating and understanding, as for Lund, different phases of a particular process in a given situation, or if framed in terms of practices, the enactment of the unfolding of a practice. By using different methods, it is possible to extract not only different kinds of information on diverse practices and processes but in effect, extract different practices and processes out of the available primary and secondary traces.

In this sense, the choice of methods for identifying and extracting paradata goes beyond the simple question of choosing a method that is applicable to analysing a small or larger corpus of traces consisting of specific types of data. It is also, in a very fundamental sense, a question of choosing a method that is applicable for extracting, or more correctly constructing and enacting a specific kind of practice or process. A chaîne opératoire enacts an operational chain whereas narrative inquiry constructs a story similar to how following GPS coordinates enacts a journey in space rather than a rich description of a complex practice in its entirety.

Finally, the present brief review of a small sample of methods applicable for identifying and extracting paradata shows also how paradata not only adds to our understanding of data creation, management and use practices and processes to enable reuse (Goodwin et al., Reference Golub and Liu2017) but can also generate new perspectives to the datasets and how they can be used (Carpentieri et al., Reference Carpentieri, Carter and Jeppesen2023). When collecting data for meta-analysis in biomedical research, paradata associated with clinical trials data can be useful for ensuring the integrity of datasets (Li et al., Reference Li, Higgins and Deeks2023) and also reduce the publication bias effects of not including unpublished, difficult-to-find studies (Borenstein et al., Reference Borenstein, Hedges, Higgins and Rothstein2009). Forensic analysis of digital footprints or traces from the activities on social networks can similarly help to establish the trustworthiness and authenticity of digital records in, for example, specific legal contexts (Duranti, Reference Duranti2009a; Pasquini et al., Reference Pasquini2021). The multiple uses and usabilities of the different methods underline their diversity. It also demonstrates the malleability of paradata discussed throughout this volume and how it can be bent to diverse uses.

5.4 Conclusions

The effective identification of paradata during data creation processes is important for enabling and guiding data reuse. Retrospective methods of extracting paradata, including qualitative and quantitative backtracking, and data forensics and diplomatics, provide clues for discerning past activities but also for ensuring its integrity, authenticity and trustworthiness. Since contextual information about data (including data description, attributes and research methods) significantly influences data reuse across disciplines, data reusers can mitigate the risk of data misinterpretation by familiarising themselves with methods for identifying paradata related to data creation practices and processes.

The selection of methods introduced in this chapter provide researchers with guidance on effectively identifying and extracting paradata for secondary data analysis. Such analysis not only enriches our understanding of the research process but also generates new perspectives on datasets independent of research discipline and domain of practice. Qualitative backtracking methods enable the analysis of data to discern practices and processes, offering valuable insights, for example, into fieldwork dynamics, such as interviewers and participants’ interaction in survey studies and data generation field sciences. Quantitative backtracking methods, including meta-analysis and natural language processing techniques, offer means to identify and extract practice and process information across large sets of data and secondary documentation. Data forensics and diplomatics are examples of methodologies that extend beyond individual methods. They both provide guidance in how to think and act regarding evidence on practices and processes and extraction of potential paradata. They also exemplify the benefits of systematicity in the work of identifying and extracting paradata-like information. Data forensics provide a tentative template on how to proceed with paradata analysis and extraction and digital diplomatics, a lens to direct attention into specific aspects of documentation as records, pertaining to practices and processes.

References

Arminen, I. (2017). Institutional Interaction: Studies of Talk at Work. Routledge.10.4324/9781315252209CrossRefGoogle Scholar
Arminen, I. and Simonen, M. (2021). Expertise as a domain in interaction. Discourse Studies 23(5), 577596. https://doi.org/10.1177/14614456211016797.CrossRefGoogle Scholar
Arshad, H., Jantan, A. B. and Abiodun, O. I. (2018). Digital forensics: Review of issues in scientific validation of digital evidence. Journal of Information Processing Systems, 14(2), 346376.Google Scholar
Audouze, F. and Karlin, C. (2017). La chaîne opératoire a 70 ans : qu’en ont fait les préhistoriens français. Journal of Lithic Studies, 4(2), 573.10.2218/jls.v4i2.2539CrossRefGoogle Scholar
Bach, R. L., Kern, C., Bonnay, D. and Kalaora, L. (2022). Understanding political news media consumption with digital trace data and natural language processing. Journal of the Royal Statistical Society Series A: Statistics in Society 185(Supplement_2), S246S269. https://doi.org/10.1111/rssa.12846.CrossRefGoogle Scholar
Barkhatova, L. A. (2023). The computational analysis of digital traces in ethnographic studies of online communities. Bulletin of Sociological Methodology/Bulletin de Méthodologie Sociologique 160(1), 3056. https://doi.org/10.1177/07591063231196161.CrossRefGoogle Scholar
Bauer, A. A. (2019). Itinerant objects. Annual Review of Anthropology, 48(1), 335352.10.1146/annurev-anthro-102218-011111CrossRefGoogle Scholar
Bellan, P., Dragoni, M. and Ghidini, C. (2024). Process knowledge extraction and knowledge graph construction through prompting: A quantitative analysis. In Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing. New York: Association for Computing Machinery, 16341641. https://doi.org/10.1145/3605098.3635957.CrossRefGoogle Scholar
Binding, C. and Tudhope, D. (2023). Automatic normalization of temporal expressions. Journal of Computer Applications in Archaeology 6(1), 2439. https://doi.org/10.5334/jcaa.105.CrossRefGoogle Scholar
Bird, S., Klein, E. and Loper, E. (2009). Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media.Google Scholar
Bolden, G. B. (2015). Transcribing as research: ‘Manual’ transcription and conversation analysis. Research on Language and Social Interaction 48(3), 276280. https://doi.org/10.1080/08351813.2015.1058603.CrossRefGoogle Scholar
Borenstein, M., Hedges, L. V., Higgins, J. P. T. and Rothstein, H. R. (2009). Publication bias. In Introduction to Meta‐Analysis. Wiley. https://doi.org/10.1002/9780470743386.CrossRefGoogle Scholar
Börjesson, L. (2021). Legacy in the making: A knowledge infrastructural perspective on systems for archeological information sharing. Open Archaeology, 7(1), 16361647.10.1515/opar-2020-0213CrossRefGoogle Scholar
Börjesson, L., Sköld, O., Friberg, Z., Löwenborg, D., Pálsson, G. and Huvila, I. (2022). Re-purposing excavation database content as paradata: An explorative analysis of paradata identification challenges and opportunities. KULA: Knowledge Creation, Dissemination, and Preservation Studies, 6(3), 118.10.18357/kula.221CrossRefGoogle Scholar
Brysbaert, A. (2012). People and their things: Integrating archaeological theory into prehistoric Aegean museum displays. In Narrating Objects, Collecting Stories. Routledge.Google Scholar
Carpentieri, J., Carter, L. and Jeppesen, C. (2023). Between life course research and social history: New approaches to qualitative data in the British birth cohort studies. International Journal of Social Research Methodology 1–28. https://doi.org/10.1080/13645579.2023.2218234.CrossRefGoogle Scholar
Choumert-Nkolo, J., Cust, H. and Taylor, C. (2019). Using paradata to collect better survey data: Evidence from a household survey in Tanzania. Review of Development Economics 23(2), 598618. https://doi.org/10.1111/rode.12583.CrossRefGoogle Scholar
Copland, F. (2018). Observation and fieldnotes. In Phakiti, A., De Costa, P. Plonsky, L. and Starfield, S. (eds.), The Palgrave Handbook of Applied Linguistics Research Methodology. Palgrave Macmillan, 249268.10.1057/978-1-137-59900-1_12CrossRefGoogle Scholar
Coupaye, L. (2022). Making ‘technology’ visible: Technical activities and the chaîne opératoire. In Bruun, M. H., Wahlberg, A., Douglas-Jones, R., Hasse, C., Hoeyer, K., Kristensen, D. B. and Winthereik, B. R. (eds.), Palgrave Handbook of the Anthropology of Technology. Basingstoke: Palgrave Macmillan, 3760.10.1007/978-981-16-7084-8_2CrossRefGoogle Scholar
Dannehl, K. (2017). Object biographies: From production to consumption. In History and Material Culture, 2nd ed, Routledge.Google Scholar
De Klerk, S., Van Noord, S., and Van Ommering, C. J. (2019). The theory and practice of educational data forensics. In Veldkamp, B. P. and Sluijter, C. (eds.), Theoretical and Practical Advances in Computer-based Educational Measurement, Cham: Springer International Publishing, 381399.10.1007/978-3-030-18480-3_20CrossRefGoogle Scholar
Deeks, J. J., Higgins, J. P., Altman, D. G. and Group, C. S. M. (2023). Analysing data and undertaking meta-analyses. In Cochrane Handbook for Systematic Reviews of Interventions. Wiley Online Library. https://training.cochrane.org/handbook/current/chapter-10Google Scholar
Drew, P. (2004). Conversation analysis. In Handbook of Language and Social Interaction. Psychology Press, 71102.Google Scholar
DuBois, A. (2003). Close reading: An introduction. In Lentricchia, F. and DuBois, A. (eds.), Close Reading: A Reader, Durham, NC: Duke University Press, 140.Google Scholar
Duranti, L. (2009a). From digital diplomatics to digital records forensics. Archivaria, 68, 3966.Google Scholar
Duranti, L. (2009b). Diplomatics. In Bates, M. J. and Maack, M. N. (eds.), Encyclopedia of Library and Information Sciences, 3rd ed., CRC Press, 15931601.Google Scholar
Duranti, L. (2014). The return of diplomatics as a forensic discipline. In Ambrosio, A. Barret, S. and Vogeler, G. (eds.), Digital Diplomatics: The Computer as a Tool for the Diplomatist? Köln/Wien: Böhlau Verlag, 8998.10.7788/boehlau.9783412217020.89CrossRefGoogle Scholar
Durrant, G. and Kreuter, F. (2013). Editorial: The use of paradata in social survey research. Journal of the Royal Statistical Society. Series A (Statistics in Society), 176(1), 13.10.1111/j.1467-985X.2012.01082.xCrossRefGoogle Scholar
Edwards, R., Goodwin, J., O’Connor, H. and Phoenix, A. (2017). Working with Paradata, Marginalia and Fieldnotes: The Centrality of By-products of Social Research. Edward Elgar Publishing.CrossRefGoogle Scholar
Ekström, B. (2022). Trace data visualisation enquiry: A methodological coupling for studying information practices in relation to information systems. Journal of Documentation, 78(7), 141159.10.1108/JD-04-2021-0082CrossRefGoogle Scholar
Emerson, R. M., Fretz, R. I. and Shaw, L. L. (2011). Writing Ethnographic Fieldnotes, 2nd ed. University of Chicago press.10.7208/chicago/9780226206868.001.0001CrossRefGoogle Scholar
Fahmy, E. and Bell, K. (2017). Using paradata to evaluate survey quality: Behaviour coding the 2012 PSE-UK survey. In Edwards, R., Goodwin, J., O’Connor, H., and Phoenix, A. (eds.), Working with Paradata, Marginalia and Fieldnotes. Edward Elgar Publishing. https://doi.org/10.4337/9781784715250.00009.Google Scholar
Faniel, I. M., Frank, R. D. and Yakel, E. (2019). Context from the data reuser’s point of view. Journal of Documentation, 75(6), 12741297. https://doi.org/10.1108/JD-08-2018-0133.CrossRefGoogle Scholar
Fontijn, D. (2013). Epilogue: Cultural biographies and itineraries of things: Second thoughts. In Hahn, H. P. and Weiss, H. (eds.), Mobility, Meaning and Transformations of Things, Oxbow Books, 183196.10.2307/j.ctvh1dn08.16CrossRefGoogle Scholar
Friberg, Z., and Huvila, I. (2019). Using object biographies to understand the curation crisis: Lessons learned from the museum life of an archaeological collection. Museum Management and Curatorship, 34(4), 362382.10.1080/09647775.2019.1612270CrossRefGoogle Scholar
Geiger, S. (2016). Trace ethnography: A retrospective. Ethnography Matters (Blog). Retrieved from http://ethnographymatters.net/blog/2016/03/23/trace-ethnography-a-retrospective/.Google Scholar
Geiger, R. S. and Ribes, D. (2011). Trace ethnography: Following coordination through documentary practices. In Sprague, Ralph H. (ed.), System Sciences (HICSS), 2011 44th Hawaii International Conference, 1–10.CrossRefGoogle Scholar
Goldsmith, E. S,, Taylor, B, C., Greer, N., Murdoch, M., MacDonald, R., McKenzie, L., Rosebush, C. E. and Wilt, T. J. (2018) Focused evidence review: Psychometric properties of patient-reported outcome measures for chronic musculoskeletal pain. Journal of General Internal Medicine 33(1), 6170. https://doi.org/10.1007/s11606-018-4327-8.CrossRefGoogle ScholarPubMed
Goodwin, C. and Heritage, J. (1990) Conversation Analysis. Annual Review of Anthropology, 283–307. https://doi.org/10.1146/annurev.an.19.100190.001435.CrossRefGoogle Scholar
Goodwin, J., O’Connor, H., Phoenix, A. and Edwards, R. (2017). Introduction: Working with paradata, marginalia and fieldnotes. In Edwards, R., Goodwin, J., O’Connor, H. and Phoenix, A. (eds.), Working with Paradata, Marginalia and Fieldnotes. Edward Elgar Publishing, 119Google Scholar
Gosden, C., and Marshall, Y. (1999). The cultural biography of objects. World Archaeology, 31(2), 169178.10.1080/00438243.1999.9980439CrossRefGoogle Scholar
Gregory, K., Groth, P., Scharnhorst, A. and Wyatt, S. (2020). Lost or found? Discovering data needed for research. Harvard Data Science Review. https://doi.org/10.1162/99608f92.e38165eb.CrossRefGoogle Scholar
Gregory, K. and Koesten, L. (2022). Data needs. In Human-Centered Data Discovery. Cham: Springer International Publishing, 1932. https://doi.org/10.1007/978-3-031-18223-5_3.CrossRefGoogle Scholar
Haddington, P., Eilittä, T., Kamunen, A., Kohonen-Aho, L., Oittinen, T., Rautiainen, I. and Vatanen, A. (eds.) (2023). Ethnomethodological Conversation Analysis in Motion: Emerging Methods and New Technologies. Oxford, UK: Taylor & Francis Group.CrossRefGoogle Scholar
Hamouda, H. A. (2023). Authenticating citizen journalism videos by incorporating the view of archival diplomatics into the verification processes of open-source investigations (OSINT). In 2023 IEEE International Conference on Big Data (BigData). Sorrento, Italy: IEEE, 20362046. https://doi.org/10.1109/BigData59044.2023.10386935.CrossRefGoogle Scholar
Have, P. ten (2002). Reflections on transcription. Cahiers de Praxématique (39), 2143. https://doi.org/10.4000/praxematique.1833.Google Scholar
Have, P. ten (2006). Review essay: Conversation analysis versus other approaches to discourse. Forum qualitative sozialforschung. Forum: Qualitative Social Research, 7(2). https://doi.org/10.17169/fqs-7.2.100CrossRefGoogle Scholar
Hepburn, A. and Bolden, G. B. (2012). The conversation analytic approach to transcription. In The Handbook of Conversation Analysis, 57–76.10.1002/9781118325001.ch4CrossRefGoogle Scholar
Heritage, J. (2004). Conversation analysis and institutional talk. In Handbook of Language and Social Interaction. Psychology Press, 103147.Google Scholar
Hodges, J. A. (2021). Forensically reconstructing biomedical maintenance labor: PDF metadata under the epistemic conditions of COVID-19. Journal of the Association for Information Science and Technology, 72, 14001414. https://doi.org/10.1002/asi.24484.CrossRefGoogle ScholarPubMed
Huvila, I. (2012). Being formal and flexible: Semantic Wiki as an archaeological e-science infrastructure. In Zhou, M., Romanowska, I., Wu, Z., Xu, P. and Verhagen, P. (eds.), Revive the Past: Proceeding of the 39th Conference on Computer Applications and Quantitative Methods in Archaeology, Beijing, 12–16 April 2011, Amsterdam: Amsterdam University Press, 186197.Google Scholar
Huvila, I. (2022). Improving the usefulness of research data with better paradata. Open Information Science, 6(1), 2848. https://doi.org/10.1515/opis-2022-0129CrossRefGoogle Scholar
Huvila, I., Andersson, L., Sköld, O. and Liu, Y.-H. (2025). Data makers’ and users’ views on useful paradata: Priorities in documenting data creation, curation, manipulation and use in archaeology. International Journal of Digital Curation, 15(1), https://doi.org/10.2218/ijdc.v19i1.892CrossRefGoogle Scholar
Huvila, I., Börjesson, L. and Sköld, O. (2022). Citing methods literature: Citations to field manuals as paradata on archaeological fieldwork. Information Research 27(3). https://doi.org/10.47989/irpaper941Google Scholar
Huvila, I. and Sinnamon, L. (2022). Sharing research design, methods and process information in and out of academia. Proceedings of the Association for Information Science and Technology, 59(1), 132144. https://doi.org/10.1002/pra2.611.CrossRefGoogle Scholar
Huvila, I., Sköld, O. and Börjesson, L. (2021b). Documenting information making in archaeological field reports. Journal of Documentation, 77(5), 11071127.10.1108/JD-11-2020-0188CrossRefGoogle Scholar
Huvila, I. and Sköld, O. (2023). A fieldwork manual as a regulatory device: Instructing, prescribing and describing documentation work. Journal of Information Science, https://doi.org/10.1177/01655515231203506.CrossRefGoogle Scholar
Huvila, I., Sköld, O. and Andersson, L. (2023). Knowing-in-practice, its traces and ingredients. In Cozza, M. and Gherardi, S. (eds.), The Posthumanist Epistemology of Practice Theory: Re-imagining Method in Organization Studies and Beyond, Cham: Palgrave MacMillan, 3769.10.1007/978-3-031-42276-8_2CrossRefGoogle Scholar
Huvila, I., Vats, E., Friberg, Z., Börjesson, L., Kaiser, J., and Sköld, O. (2022). Extracting process information from archival records. Digital archives, Big Data and Memory, Springer: Copenhagen.Google Scholar
Jefferson, G. (2004). Glossary of transcript symbols with an introduction. In Lerner, G. H. (ed.), Conversation Analysis: Studies from the First Generation, Amsterdam: John Benjamins, 1331. https://doi.org/10.1075/pbns.125.02jefCrossRefGoogle Scholar
Johns, M., Meurers, T., Wirth, F. N., Haber, A. C., Müller, A., Halilovic, M., Balzer, F. and Prasser, F. (2023). Data provenance in biomedical research: Scoping review. Journal of Medical Internet Research, 25, e42289. https://doi.org/10.2196/42289.CrossRefGoogle ScholarPubMed
Joy, J. (2009). Reinvigorating object biography: Reproducing the drama of object lives. World Archaeology, 41(4), 540556.10.1080/00438240903345530CrossRefGoogle Scholar
Joyce, J. B., Douglass, T., Benwell, B., Rhys, C. S., Parry, R., Simmons, R. and Kerrison, A. (2023). Should we share qualitative data? Epistemological and practical insights from conversation analysis. International Journal of Social Research Methodology, 26(6), 645659.CrossRefGoogle Scholar
Kirschenbaum, M. G. (2008). Mechanisms: New Media and the Forensic Imagination, Cambridge, MA: MIT Press.Google Scholar
Kirschenbaum, M. G. (2014). Operating systems of the mind: Bibliography after word processing (the example of Updike). Papers of the Bibliographical Society of America, 108(4), 380412.10.1086/681565CrossRefGoogle Scholar
Kocar, S. and Biddle, N. (2023). The power of online panel paradata to predict unit nonresponse and voluntary attrition in a longitudinal design. Quality & Quantity 57(2), 10551078. https://doi.org/10.1007/s11135-022-01385-x.CrossRefGoogle Scholar
Kopytoff, I. (1986). The cultural biography of things: Commodization as process. In Appadurai, A. (ed.), The Social Life of Things: Commodities in Cultural Perspective, Cambridge: Cambridge University Press, 6491.10.1017/CBO9780511819582.004CrossRefGoogle Scholar
Kunz, T., Daikeler, J. and Ackermann-Piek, D. (2024). Interviewer-observed paradata in mixed-mode and innovative data collection. International Journal of Market Research 66(1), 1426. https://doi.org/10.1177/14707853231184742.CrossRefGoogle Scholar
Li, T., Higgins, J., Deeks, J., et al. (2023). Collecting data. In Cochrane Handbook for Systematic Reviews of Interventions version 6.4 (updated August 2023). Cochrane. www.training.cochrane.org/handbook.Google Scholar
Liu, Y.-H., Wu, M., Power, M. and Burton, A. (2023). Elicitation of Contexts for Discovering Clinical Trials and Related Health Data: An Interview Study. Zenodo. Retrieved from https://zenodo.org/records/7839282Google Scholar
Lund, N. W. (2024). Introduction to Documentation Studies, London: Facet.10.29085/9781783302536CrossRefGoogle Scholar
McIlvenny, P. and Davidsen, J. (2023) Beyond video: Using practice-based VolCap analysis to understand analytical practices volumetrically. In Haddington, P., Eilittä, T., Kamunen, A., Kohonen-Aho, L., Oittinen, T., Rautiainen, I., and Vatanen, A., (eds.), Ethnomethodological Conversation Analysis in Motion. London: Routledge, 221244. https://doi.org/10.4324/9781003424888-15.CrossRefGoogle Scholar
Mondada, L. (2012) The conversation analytic approach to data collection. In The Handbook of Conversation Analysis, 32–56. https://doi.org/10.1002/9781118325001.ch3CrossRefGoogle Scholar
Murillo, A. P. (2022) Data matters: How earth and environmental scientists determine data relevance and reusability. Collection and Curation, 41(3), 7786. https://doi.org/10.1108/CC-11-2018-0023.CrossRefGoogle Scholar
Neuberger, J., Ackermann, L. and Jablonski, S. (2024) Beyond rule-based named entity recognition and relation extraction for process model generation from natural language text. In Sellami, M., Vidal, M.-E., van Dongen, B., Gaaloul, W. and Panetto, H. (eds.), Cooperative Information Systems. Cham: Springer Nature Switzerland, 179197.10.1007/978-3-031-46846-9_10CrossRefGoogle Scholar
Niu, J. (2013). Provenance: Crossing boundaries. Archives and Manuscripts, 41(2), 105115. https://doi.org/10.1080/01576895.2013.811426.CrossRefGoogle Scholar
Opgenhaffen, L. (2022). Archives in action. The impact of digital technology on archaeological recording strategies and ensuing open research archives. Digital Applications in Archaeology and Cultural Heritage 27, e00231. https://doi.org/10.1016/j.daach.2022.e00231.CrossRefGoogle Scholar
Pandey, A. K. et al. (2020). Current challenges of digital forensics in cyber security: In Husain, M. S. and Khan, M. Z. (eds.), Advances in Digital Crime, Forensics, and Cyber Terrorism, IGI Global, 3146.Google Scholar
Pasquini, C. (2021). Amerini I and Boato G (2021) Media forensics on social media platforms: a survey. EURASIP Journal on Information Security (1), 4. https://doi.org/10.1186/s13635-021-00117-2.CrossRefGoogle Scholar
Pentland, B., Recker, J., Wolf, J. and Wyner, G. (2020). Bringing context inside process research with digital trace data. Journal of the Association for Information Systems, 21(5).10.17705/1jais.00635CrossRefGoogle Scholar
Phoenix, A, Boddy, J, Edwards, R and Elliott, H (2017). ‘Another long and involved story’: Narrative themes in the marginalia of the Poverty in the UK survey. In Edwards, R, Goodwin, J, O’Connor, H, and Phoenix, A (eds.), Working with Paradata, Marginalia and Fieldnotes. Edward Elgar Publishing. https://doi.org/10.4337/9781784715250.00010.Google Scholar
Pickering, A. (1995). The Mangle of Practice: Time, Agency, and Science, Chicago: University of Chicago Press.10.7208/chicago/9780226668253.001.0001CrossRefGoogle Scholar
Polkinghorne, D. E. (1995). Narrative configuration in qualitative analysis. International Journal of Qualitative Studies in Education, 8(1), 523.10.1080/0951839950080103CrossRefGoogle Scholar
Pollitt, M. (2010). A History of Digital Forensics. In Chow, K.-P. and Shenoi, S. (eds.), Advances in Digital Forensics VI, Berlin, Heidelberg: Springer, 315.10.1007/978-3-642-15506-2_1CrossRefGoogle Scholar
Rheinberger, H.-J. (2023). Split and Splice: A Phenomenology of Experimentation, Chicago, IL: University of Chicago Press.10.7208/chicago/9780226825311.001.0001CrossRefGoogle Scholar
Ries, T. (2018). The rationale of the born-digital dossier génétique: Digital forensics and the writing process: With examples from the Thomas Kling Archive. Digital Scholarship in the Humanities, 33(2), 391424.10.1093/llc/fqx049CrossRefGoogle Scholar
Rogers, R. (2023). Tracker analysis: Detection techniques for data journalism research. In Doing Digital Methods, 2nd ed. SAGE, 239258.Google Scholar
Rösch, F. (2021). From drawing into digital: On the transformation of knowledge production in postexcavation processing. Open Archaeology, 7(1), 15061528. https://doi.org/10.1515/opar-2020-0211.CrossRefGoogle Scholar
Sakahira, F., Yamaguchi, Y. and Terano, T. (2023). Understanding cultural similarities of archaeological sites from excavation reports using natural language processing technique. Journal of Advanced Computational Intelligence and Intelligent Informatics, 27(3), 394403. https://doi.org/10.20965/jaciii.2023.p0394.CrossRefGoogle Scholar
Sakshaug, J. W. and Struminskaya, B. (2023). Augmenting surveys with paradata, administrative data, and contextual data. Public Opinion Quarterly , 87(S1), 475479. https://doi.org/10.1093/poq/nfad026.CrossRefGoogle ScholarPubMed
Salinas, A., Penafiel, L., McCormack, R. and Morstatter, F. (2023). ‘I’m not racist but…’: Discovering bias in the internal knowledge of Large Language Models. arXiv. http://arxiv.org/abs/2310.08780 (accessed 18 June 2024)Google Scholar
Schofield, J., Wyles, K. J., Doherty, S., Donnelly, A., Jones, J., and Porter, A. (2020). Object narratives as a methodology for mitigating marine plastic pollution: Multidisciplinary investigations in Galápagos. Antiquity, 94(373), 228244.10.15184/aqy.2019.232CrossRefGoogle Scholar
Sharp, N. L., Bye, R. A. and Cusick, A. (2018). Narrative analysis. In Liamputtong, P. (ed.), Handbook of Research Methods in Health Social Sciences, Singapore: Springer, 121.Google Scholar
Shoilee, S. B. A., de Boer, V. and van Ossenbruggen, J. (2023). Polyvocal knowledge modelling for ethnographic heritage object provenance. In Knowledge Graphs: Semantics, Machine Learning, and Languages. IOS Press, 127143.Google Scholar
Thomer, A. K., Wickett, K. M., Baker, K. S., Fouke, B. W. and Palmer, C. L. (2018) Documenting provenance in noncomputational workflows: Research process models based on geobiology fieldwork in Yellowstone National Park. Journal of the Association for Information Science and Technology, 69(10), 12341245. https://doi.org/10.1002/asi.24039.CrossRefGoogle Scholar
Venturini, T, Bounegru, L, Gray, J and Rogers, R (2018) A reality check(list) for digital methods. New Media & Society, 20(11), 41954217.10.1177/1461444818769236CrossRefGoogle Scholar
Zawoad, S., and Hasan, R. (2015). Digital forensics in the age of big data: Challenges, approaches, and opportunities. In 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems, 1320–1325.10.1109/HPCC-CSS-ICESS.2015.305CrossRefGoogle Scholar
Zimmerman, A. S. (2008) New knowledge from old data: The role of standards in the sharing and reuse of ecological data. Science, Technology, & Human Values, 33(5), 631652. https://doi.org/10.1177/0162243907306704.CrossRefGoogle Scholar
Figure 0

Figure 5.1 A simple chaîne opératoire representing a data collection, research and data archiving process with major operations and actors represented.Figure 5.1 long description.

Accessibility standard: Inaccessible, or known limited accessibility

The HTML of this book is known to have missing or limited accessibility features. We may be reviewing its accessibility for future improvement, but final compliance is not yet assured and may be subject to legal exceptions. If you have any questions, please contact accessibility@cambridge.org.

Content Navigation

Table of contents navigation
Allows you to navigate directly to chapters, sections, or non‐text items through a linked table of contents, reducing the need for extensive scrolling.
Index navigation
Provides an interactive index, letting you go straight to where a term or subject appears in the text without manual searching.

Reading Order & Textual Equivalents

Single logical reading order
You will encounter all content (including footnotes, captions, etc.) in a clear, sequential flow, making it easier to follow with assistive tools like screen readers.
Short alternative textual descriptions
You get concise descriptions (for images, charts, or media clips), ensuring you do not miss crucial information when visual or audio elements are not accessible.
Full alternative textual descriptions
You get more than just short alt text: you have comprehensive text equivalents, transcripts, captions, or audio descriptions for substantial non‐text content, which is especially helpful for complex visuals or multimedia.

Visual Accessibility

Use of colour is not sole means of conveying information
You will still understand key ideas or prompts without relying solely on colour, which is especially helpful if you have colour vision deficiencies.
Use of high contrast between text and background colour
You benefit from high‐contrast text, which improves legibility if you have low vision or if you are reading in less‐than‐ideal lighting conditions.

Save book to Kindle

To save this book to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×