Hostname: page-component-89b8bd64d-9prln Total loading time: 0 Render date: 2026-05-12T20:02:40.165Z Has data issue: false hasContentIssue false

Ensuring a safe(r) harbor: Excising personally identifiable information from structured electronic health record data

Published online by Cambridge University Press:  09 December 2021

Emily R. Pfaff*
Affiliation:
Department of Medicine, UNC Chapel Hill School of Medicine, Chapel Hill, North Carolina, USA
Melissa A. Haendel
Affiliation:
University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA
Kristin Kostka
Affiliation:
The OHDSI Center at the Roux Institute, Northeastern University, Portland, Maine, USA
Adam Lee
Affiliation:
TraCS Institute, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
Emily Niehaus
Affiliation:
Palantir Technologies, Denver, Colorado, USA
Matvey B. Palchuk
Affiliation:
TriNetX LLC, Cambridge, Massachusetts, USA
Kellie Walters
Affiliation:
TraCS Institute, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
Christopher G. Chute
Affiliation:
Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, Maryland, USA
*
Address for correspondence: E. R. Pfaff, PhD, MS, Department of Medicine, UNC Chapel Hill School of Medicine, 160 N Medical Drive, Chapel Hill, NC 27599, USA. Email: epfaff@email.unc.edu
Rights & Permissions [Opens in a new window]

Abstract

Recent findings have shown that the continued expansion of the scope and scale of data collected in electronic health records are making the protection of personally identifiable information (PII) more challenging and may inadvertently put our institutions and patients at risk if not addressed. As clinical terminologies expand to include new terms that may capture PII (e.g., Patient First Name, Patient Phone Number), institutions may start using them in clinical data capture (and in some cases, they already have). Once in use, PII-containing values associated with these terms may find their way into laboratory or observation data tables via extract-transform-load jobs intended to process structured data, putting institutions at risk of unintended disclosure. Here we aim to inform the informatics community of these findings, as well as put out a call to action for remediation by the community.

Information

Type
Special Communications
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2021. Published by Cambridge University Press on behalf of The Association for Clinical and Translational Science
Figure 0

Table 1. Example identifier-containing Logical Observation Identifiers Names and Codes (LOINC) codes

Figure 1

Fig. 1. Removing an entire column known to contain personally identifiable information (PII) (a) is significantly easier than identifying PII-containing rows (b) that exist among nonidentifying records.