Searching for Anglo-American Digital Legal History


As the fields of digital humanities and digital history have grown in scale and visibility since the 1990s, legal history has largely remained on the margins of those fields. The move to make material available online in the first decade of the web featured only a small number of legal history projects: Famous Trials; Anglo-American Legal Tradition; The Proceedings of the Old Bailey Online, 1674–1913. Early efforts to construct hypertext narratives and scholarship also included some works of legal history: “Hearsay of the Sun: Photography, Identity and the Law of Evidence in Nineteenth-Century Courts,” in Hypertext Scholarship in American Studies; Who Killed William Robinson? and Gilded Age Plains City: The Great Sheedy Murder Trial and the Booster Ethos of Lincoln, Nebraska. In the second decade of the web, the focus shifted from distributing material to exploring it using digital tools. The presence of digital history grew at the meetings of organizations of historians ranging from the American Historical Association to the Urban History Association, but not at the American Society for Legal History conferences, the annual meetings of the Law and Society Association, or the British Legal History Conference. Only a few Anglo-American legal historians took up computational tools for sorting and visualizing sources such as data mining, text mining, and topic modeling; network analysis; and mapping. Paul Craven and Douglas Hay's Master and Servant project text mined a comprehensive database of 2,000 statutes and 1,200,000 words to explore similarities and influence among statutes. Data Mining with Criminal Intent mined and visualized the words in trial records using structured data from The Proceedings of the Old Bailey Online, 1674–1913. Locating London's Past, a project that mapped resources relating to the early modern and eighteenth century city, and also made use of the Old Bailey records. Digital Harlem mapped crime in the context of everyday life in the 1920s. Only in the past few years has more digital legal history using computational tools begun to appear, and like many of the projects discussed in this special issue, most remain at a preliminary stage. This article seeks to bring into focus the constraints, possibilities, and choices that shape digital legal history, in order to create a context for the work in this special issue, and to promote discussion of what it means to do legal history in the digital age.

1. I make no claim to have comprehensively surveyed digital legal history. There is no register or compilation of work in digital history, let alone of digital legal history; therefore, this overview by necessity is focused on the area of legal history in which I work and with which I am familiar: Anglo-American legal history. In looking for digital legal history, I drew on the list of projects compiled in 2013 by Kaci Nash and William G. Thomas III for Thomas’ “The Promise of the Digital Humanities and the Contested Nature of Digital Scholarship,” in A New Companion to Digital Humanities, ed Susan Schreibman, Ray Siemens, and John Unsworth (Chichester: Wiley Blackwell, 2016), 603–17. Their Digitalhistory Zotero Library can be found at For an overview of digital history, see Stephen Robertson, “The Differences between Digital Humanities and Digital History,” in Debates in the Digital Humanities 2016, ed. Matt Gold and Lauren Klein (Minneapolis: University of Minnesota Press) Accessed 24 July 2016.

2. Anglo-American Legal Tradition: Documents from Medieval and Early Modern England from the National Archives in London; Famous Trials; and The Proceedings of the Old Bailey Accessed 24 July 2016.

3. Thomas Thurston, “Hearsay of the Sun: Photography, identity and the Law of Evidence in Nineteenth-Century Courts,” Hypertext Scholarship in American Studies; Who Killed William Robinson? (now part of a collection of Great Unsolved Mysteries in Canadian History; Gilded Age Plains City: The Great Sheedy Murder Trial and the Booster Ethos of Lincoln, Nebraska, Accessed 24 July 2016.

4. Daniel Cohen nicely captures this change as involving a conceptual shift from discussing the web using nouns such as “web pages” and “web sites,” to using verbs such as “searching,” “sorting,” “gathering,” and “communicating.” See Cohen Daniel J., “History and the Second Decade of the Web,Rethinking History 8 (2004): 295 .

5. For digital history at the meetings of American historical organizations, see Robertson, “Differences,” para. 2.

6. These tools are defined and discussed later. For an introduction to text mining and topic modeling for historians, see Shawn Graham, Ian Milligan, and Scott Weingart, The Historian's Macroscope: Exploring Big Historical Data (London: Imperial College Press, 2015).

7. Craven Paul and Traves William, “A General-Purpose Hierarchical Coding Engine and its Application to Comparative Analysis of Statutes,Literary and Linguistic Computing 8 (1993): 2732 ; Craven Paul and Hay Douglas, “Computer Applications in Comparative History: The Master & Servant Project at York University (Canada),History and Computing 7 (1995): 6980 ; and Douglas Hay and Paul Craven, “Introduction,” in Masters, Servants, and Magistrates in Britain and the Empire, 1562–1955, ed. Hay and Craven (Chapel Hill: University of North Carolina Press, 2004). This database has not been made available online.

8. The Proceedings of the Old Bailey Online, 1674–1913; Locating London's Past; Data Mining with Criminal Intent Accessed 24 July 2016.

9. Digital Harlem: Everyday Life, 1915–1930 Accessed 24 July 2016.

10. William Thomas, O Say Can You See: Early Washington, D.C., Law and Family; Kellen Funk and Lincoln Mullen, “A Servile Copy: Text Reuse and Medium Data in American Civil Procedure,” in Forum: Die geisteswissenschaftliche Perspektive: Welche Forschungsergebnisse lassen Digital Humanities erwarten? [Forum: With the Eyes of a Humanities Scholar: What Results Can We Expect from Digital Humanities?], Rechtsgeschichte 24 [Legal History] (forthcoming, 2016); Adam Badawi and Rend Bod, “Legal Structures,” 2013 Digging Into Data Challenge; Lea VanderVelde, The Law of the Antebellum Frontier; Stephen Berry, CSI: Dixie; and John Blanton, Micki Kaufman, and Nora Slonimskey, “An Analysis of Three Editions of the Blackstone Legal Commentaries Using Computational Text Analysis” Accessed 24 July 2016.

11. The Making of Modern Law; HeinOnline; HathiTrust Digital Library; and Google Books, For additional online primary sources, see the list on Legal History on the Web Accessed 24 July 2016.

For an older survey, see Cohen Morris, “Researching Legal History in the Digital Age,” Law Library Journal 99 (2007): 377–93.

12. For a recent summary overview, see Brophy Alfred L. and Vogenauer Stefan, “Introducing the Future of Legal History: On Re-launching the American Journal of Legal History,American Journal of Legal History 5 (2016): 15 .

13. Putnam Lara, “The Transnational and the Text-Searchable: Digitized Sources and the Shadows They Cast,American Historical Review 121 (2016): 390 .

14. A common rough estimate is that at most 5% of archival material has been digitized.

15. Jennifer Rutner and Roger Schonfeld, Supporting the Changing Research Practices of Historians, ITHAKA S+R, 2012; Max Kemman, Martijn Kleppe, and Stef Scagliola, “Just Google It,” in Proceedings of the Digital Humanities Congress 2012, ed. Clare Mills, Michael Pidd, and Esther Ward (Sheffield: HRI Online Publications, 2014) (accessed 24 July 2016); and Chassanoff Alexandra, “Historians and the Use of Primary Sources in the Digital Age,The American Archivist 76 (2013): 458–80.

16. Putnam, “Transnational and the Text-Searchable”; and Underwood Ted, “Theorizing Research Practices We Forgot to Theorize Twenty Years Ago,Representations 127 (2014): 65 . On OCR, see Simon Tanner, “Deciding Whether Optical Character Recognition is Feasible,” 2004 Accessed 24 July 2016.

17. By contrast, the impact of searching and computerized databases on legal research has provoked extensive and ongoing discussion and debate. See, for example, Bast Carol and Pyle Ransford, “Legal Research in the Computer Age: A Paradigm Shift?Law Library Journal 93 (2001): 285302 ; Hanson F. Allan, “From Key Numbers to Keywords: How Automation Has Transformed the Law,Law Library Journal 94 (2002): 563600 ; and McGinnis John and Wasick Steven, “Law's Algorithm,Florida Law Review 66 (2014): 9911050 . It is hoped that the appearance of Lara Putnam's article on the impact of searchable databases on transnational history in the American Historical Review, the field's leading journal, will provoke an overdue discussion of searching (Putnam).

18. For example, Drew VandeCreek noted that the corrected text of the Congressional Record available in Proquest Congressional “contained a very small amount of scanning errors, significantly fewer than those found in the portion of the [University of North Texas Libraries uncorrected] data that I reviewed, and about the same as the Hein materials.” See VandeCreek, “Text Mining at an Institution with Limited Financial Resources,” D-Lib Magazine 22 (2016) Accessed 24 July 2016. Crucially, commercial vendors do not provide information on the OCR accuracy of their products, or on how they correct the text. This situation is complicated by the fact that, as Matthew Jockers and Ted Underwood note, “since different kinds of errors have radically different effects, there is no single accuracy percentage that proves a text is good enough to support analysis.” Jockers and Underwood, “Text-Mining the Humanities,” in The New Companion to Digital Humanities, ed Susan Schreibman, Ray Siemens, and John Unsworth (Chichester: Wiley Blackwell, 2016), 359. The Beyond Citation project ( is attempting to address scholars’ need for more information about the proprietary databases of digitized material on which humanities scholars rely.

19. Caleb McDaniel, “The Digital Early Republic,” Offprints, April 7, 2011 Accessed 24 July 2016.

20. Underwood, “Theorizing Research Practices,” 66.

21. Putnam, “Transnational and the Text-Searchable” 400.

22. Nystrom Eric and Tanenhaus David, “The Future of Digital Legal History: No Magic, No Silver Bullets,American Journal of Legal History 56 (2016): 158 .

23. VandeCreek, “Text Mining.”; Andrew Prescott, “What Price Gale Cengage?” Digital Riffs, July 15, 2016 Accessed 24 July 2016.

24. Andrew Prescott, “Beyond the Digital Humanities Center: The Administrative Landscapes of Digital Humanities,” in The New Companion to Digital Humanities, ed. Susan Schreibman, Ray Siemens, and John Unsworth (Chichester: Wiley Blackwell, 2016), 536. The early collaborative digital history projects undertaken by the Center for History and New Media and the American Social History Project used the language of film production to describe the roles of project members. See Stephen Robertson, “CHNM's Histories: Collaboration in Digital History,” October 14, 2014 Accessed 24 July 2016.

25. Jennifer Dixon, “Harvard Launches “Free the law” Digitization Project,” Library Journal, December 12, 2015 Accessed 24 July 2016. The agreement does allow Harvard Law Library to provide bulk access to researchers, if it so chooses. This model of partnerships involving periods of restricted access also characterizes arrangements between and its related entities and the United States National Archives and various state archives.

26. Rosenzweig Roy, “Scarcity or Abundance? Preserving the Past in a Digital Era,American Historical Review 108 (2003): 760 .

27. American Historical Association, Guidelines for the Evaluation of Digital Scholarship in History Accessed 24 July 2016. See also Thomas, “Promise of the Digital Humanities.”

28. Toby Burrows, “Sharing Humanities Data for E-Research: Conceptual and Technical Issues,” Sustainable Data from Digital Research: Humanities Perspectives on Digital Scholarship. Proceedings of the Conference Held at the University of Melbourne, December 12–14, 2011 Accessed 24 July 2016.

29. Documents Collection Center, Yale Law School, Lillian Goldman Law Library. Accessed 24 July 2016.

30. Eisman does make brief mention of the “possibility” of crowdsourcing transcription of all the text; that is, building an online platform and recruiting volunteers to transcribe the documents using that platform. For examples of crowdsourced transcription projects, see Sharon Leon, “Build, Analyse and Generalise: Community Transcription of the Papers of the War Department and the Development of Scripto,” in Crowdsourcing Our Cultural Heritage, ed Mia Ridge (Farnham, UK: Ashgate, 2014).

31. Fred Gibbs and Trevor Owens, “The Hermeneutics of Data and Historical Writing,” in Writing History in the Digital Age, ed Kristen Nawrotzki and Jack Dougherty (Ann Arbor: University of Michigan Press, 2013),;rgn=div1;view=fulltext;xc=1#7.3. Accessed 24 July 2016. See also Jockers and Underwood, “Text-Mining the Humanities,” and Stephen Robertson, “Finding Questions As Well As Answers: Conceptualizing Digital Humanities Research,” May 2, 2016 Accessed 24 July 2016.

32. Finnane also mentions plans to enrich the Prosecution Project database by linking the data from registers to information from other sources, including “semi-automated linking” to the wealth of digitized newspapers available in Trove,

33. “Adventures with Data Linkage,” The Digital Panopticon,; “What's in a Name?: Details and Data Linkage,” The Digital Panopticon,; “Record Linkage Workshop Report, Part 2,” The Digital Panopticon,; “James Littleton and the Problems of Automatic Record Linkage,” The Digital Panopticon, Accessed 24 July 2016.

34. Tim Hitchcock, “Voices of Authority: Toward a History from Below in Patchwork,” Historyonics, April 27, 2015 Accessed 24 July 2016.

35. Jockers and Underwood, “Text-Mining the Humanities,” 351.

36. Stefan Sinclair and Geoffrey Rockwell, “Text Analysis and Visualization: Making Meaning Count,” in The New Companion to Digital Humanities, ed Susan Schreibman, Ray Siemens, and John Unsworth (Chichester: Wiley Blackwell, 2016), 339–40.

37. This work originated in Data Mining with Criminal Intent.

38. Jockers and Underwood, “Text-Mining the Humanities,” 352. The other approach in digital humanities to analyzing word frequencies is visualization, particularly using Voyant, a tool developed by Stefan Sinclair and Geoffrey Rockwell that offers a variety of charts and graphs. See and Sinclair and Rockwell “Text Analysis and Visualization.”

39. Tanenhaus and Nystrom offer only a brief summary of their research in the article in this issue. The details of their use of computational tools can be found in Nystrom and Tanenhaus, “The Future of Digital Legal History,” at 161.

40. Funk and Mullen, “A Servile Copy.”

41. For an explanation of this form of vector space modeling, see Michael Gavin, “The Arithmetic of Concepts: a response to Peter de Bolla,” Modeling Literary History, September 18, 2015 Accessed 24 July 2016.

42. A different approach to measuring similarity has been employed by Tim Hitchcock and his collaborators to explore the extent to which the treatment of crime in the Proceedings of the Old Bailey Online changed in line with the argument that a civilizing process that changed cultural norms produced a dramatic decline in violence. Like Funk and Mullen, they curated two corpuses, violent and nonviolent crimes, from the Proceedings, using the offense category tags. Rather than using word frequency as the basis for meaning, they coarse-grained the words in trials into named categories based on similarity of meaning using the nineteenth century Roget's Thesaurus. That process produced 1040 synonym sets, which nested inside 116 categories.  They found “an increasingly clear distinction, within the record of spoken language, between trials associated with violent and nonviolent indictments.” Sara Klingenstein, Tim Hitchcock, and Simon DeDeo, “The Civilizing Process in London's Old Bailey.” Proceedings of the National Academy of Sciences 111 (2014): 9419–24, quote at 9419.

43. The most closely related work I could find is an analysis of popular constitutional discourse in United States newspapers in the years 1866–84. See Young Daniel Taylor, “How Do You Measure a Constitutional Moment? Using Algorithmic Topic Modeling to Evaluate Bruce Ackerman's Theory of Constitutional Change,Yale Law Journal 122 (2013): 19902054 .

44. Klein Lauren, Eisenstein Jacob, and Sun Iris, “Exploratory Thematic Analysis for Digitized Archival Collections,Digital Scholarship in the Humanities 30, Supplement 1 (2015): 131 .

45. Topic Explorer, Accessed 24 July 2016.

46. Block Sharon, “Doing More with Digitization: An Introduction to Topic Modeling of Early American Sources,Common-Place 6 (2006); Robert Nelson, Mining the Dispatch; Micki Kaufmann, “Everything on Paper Will Be Used Against Me:Quantifying Kissinger; Andrew Torget and Jon Christensen, Mapping Texts; E. Thomas Ewing, Samah Gad, Bernice L. Hausman, Kathleen Kerr, Bruce Pencek, and Naren Ramakrishnan, An Epidemiology of Information: Datamining the 1918 Flu Pandemic, 2014 Accessed 24 July 2016.

47. Robertson, “Differences.”

48. Stephen Robertson, “Putting Harlem on the Map,” in Writing History in the Digital Age, ed Kristen Nawrotzki and Jack Dougherty (Ann Arbor: University of Michigan Press, 2013),;rgn=div1;view=fulltext;xc=1#8.2 (accessed 24 July 2016); and Robertson Stephen, “Digital Mapping as a Research Tool: Digital Harlem: Everyday Life, 1915–1930,American Historical Review 121 (2016): 156–66.

49. Shane White, Stephen Garton, Stephen Robertson, and Graham White, Playing the Numbers: Gambling in Harlem Between the Wars (Cambridge, MA: Harvard University Press, 2010). See also Stephen Robertson, “Arrests for Numbers Gambling,” Digital Harlem Blog, April 17, 2009; Stephen Robertson, “Numbers on Harlem's Streets,” Digital Harlem Blog, December 1, 2011 Accessed 24 July 2016.

50. Hitchcock, “Voices of Authority;” Tim Hitchcock, “Re-imagining the Voice of the Defendant at the Old Bailey,” The History of Crime and the Courts in Three Dimensions: a Half-Day Workshop, October 20, 2015 Accessed 24 July 2016. An example of a project that reconstructs a historical soundscape in this way is the Virtual Paul's Cross Project, a digital re-creation of John Donne's Gunpowder Day Sermon in 1622 (

51. Todd Presner and David Shepard, “Mapping the Geospatial Turn,” in The New Companion to Digital Humanities, ed Susan Schreibman, Ray Siemens, and John Unsworth (Chichester: Wiley Blackwell, 2016), 247.

52. Robertson, “Finding Questions;” Robertson, “Differences;” and Presner and Shepard, “Mapping the Geospatial Turn,” 247, 251.

53. Presner and Shepard, “Mapping the Geospatial Turn,” 247.

54. O Say Can You See Accessed 24 July 2016.

55. Gibbs Fred, “New Forms of History: Critiquing Data and Its Representations,The American Historian 7 (2016),

56. See Funk and Mullen, “A Servile Copy” for a network graph showing code to code borrowings.

57. Ravel, Data Driven Research

58. Ambrogi Robert, “Visual Law Services Are Worth a Thousand Words––and Big Money,ABA Journal (2014) See also CODEX, the Stanford Center for Legal Informatics, Accessed 24 July 2016.

59. Court Listener, Supreme Court Citation Networks Accessed 24 July 2016.

60. See Blevins Cameron, “Space, Nation, and the Triumph of Region: A View of the World from Houston,Journal of American History 101 (2014): 122–47; and Cameron Blevins, “Mining and Mapping the Production of Space: A View of the World from Houston,” 2014 Accessed 24 July 2016.

61. Cited in Thomas, “Promise of the Digital Humanities,” 606–7.

62. Stanford University Press is developing the system and framework for publishing digital-born scholarship ( West Virginia University is developing Cairn, an online, free, and open-source system that will help editors of scholarly multimedia journals, books, and data sets engage in building and reading multimedia-rich, peer-reviewed content ( The University of Minnesota Press and the GC Digital Scholarship Lab at the Graduate Center of the City University of New York (CUNY) are developing Manifold Scholarship (, a platform for an alternative iterative, networked, electronic versions of scholarly monographs alongside the print edition of the book ( The presses at Indiana, Michigan, Minnesota, Northwestern, and Penn State are developing a new platform using Hydra/Fedora that will enable the publication and preservation of digitally enriched humanities monographs ( The University of California Press and the California Digital Library are developing a system to support the publication of open-access monographs (

63. Stephen Robertson, “Tropy – Digital Image Management for the Humanities Research Community,” October 8, 2015 The development of this software can be followed at and @tropychnm.

He is the author of Crimes against Children: Sexual Violence and Legal Culture in New York City, 1880–1960 (Chapel Hill: University of North Carolina Press, 2005) and, with Shane White, Stephen Garton, and Graham White, Playing the Numbers: Gambling in Harlem Between the Wars (Cambridge, MA: Harvard University Press, 2010) as well as of Digital Harlem: Everyday Life, 1915–1930, He thanks Lincoln Mullen for his comments on earlier versions of this article.

