Biomedical Informatics/Health Informatics
2182: Developing a corpus for natural language processing to identify bleeding complications among intensive care unit patients
- Rashmee Shah, Benjamin Steinberg, Brian Bucher, Alec Chapman, Donald Lloyd-Jones, Matthew Rondina, Wendy Chapman
-
- Published online by Cambridge University Press:
- 10 May 2018, p. 12
-
- Article
-
- You have access Access
- Open access
- Export citation
-
OBJECTIVES/SPECIFIC AIMS: An accurate method to identify bleeding in large populations does not exist. Our goal was to explore bleeding representation in clinical text in order to develop a natural language processing (NLP) approach to automatically identify bleeding events from clinical notes. METHODS/STUDY POPULATION: We used publicly available notes for ICU patients at high risk of bleeding (n=98,586 notes). Two physicians reviewed randomly selected notes and annotated all direct references to bleeding as “bleeding present” (BP) or “bleeding absent” (BA). Annotations were made at the mention level (if 1 specific sentence/phrase indicated BP or BA) and note level (if overall note indicated BP or BA). A third physician adjudicated discordant annotations. RESULTS/ANTICIPATED RESULTS: In 120 randomly selected notes, bleeding was mentioned 406 times with 76 distinct words. Inter-annotator agreement was 89% by the last batch of 30 notes. In total, 10 terms accounted for 65% of all bleeding mentions. We aggregated these results into 16 common stems (eg, “hemorr” for hemorrhagic and hemorrhage), which accounted for 90% of all 406 mentions. Of all 120 notes, 60% were classified as BP. The median number of stems was 5 (IQR 2, 9) in BP Versus 0 (IQR 0, 1) in BA notes. Zero bleeding mentions in a note was associated with BA (OR 28, 95% CI 6.5, 127). With 40 true negatives and 2 false negatives, the negative predictive value (NPV) of zero bleeding mentions was 95%. DISCUSSION/SIGNIFICANCE OF IMPACT: Few bleeding-related terms are used in clinical practice. Absence of these terms has a high NPV for the absence of bleeding. These results suggest that a high throughput, rules-based NLP tool to identify bleeding is feasible.
2204: Evaluations of physiologic perturbations and their relationship with length of stay in neonatal hypoxic-ischemic encephalopathy
- Susan Slattery, Lei Liu, Haitao Chai, William Grobman, Jennie Duggan, Doug Downey, Karna Murthy
-
- Published online by Cambridge University Press:
- 10 May 2018, pp. 12-13
-
- Article
-
- You have access Access
- Open access
- Export citation
-
OBJECTIVES/SPECIFIC AIMS: Neonatal hypoxic-ischemic encephalopathy (HIE) is frequently accompanied with physiologic perturbations and organ dysfunction. Markers of these perturbations and their associations with length of stay (LOS) are uncertain. To estimate the association between changes in selected physiologic and/or laboratory values with LOS in newborns with HIE. METHODS/STUDY POPULATION: Using the Children’s Hospitals Neonatal Database (CHND), we identified neonates with HIE at our center born ≥36 weeks’ gestation from 2010 to 2016. Those with major congenital anomalies were omitted. Infants uniformly received therapeutic hypothermia for 72 hours unless death occurred sooner. Inpatient vital signs and selected laboratory markers were collected from our institution’s health informatics, electronic data warehouse (EDW) and then matched to records in CHND. With severity of HIE, gender, and confirmed seizures, each marker’s association with LOS was calculated using multivariable Cox proportional hazards regression equations. These analyses were stratified by mortality. Candidate markers were vital signs, pulse oximetry, creatinine, acidosis (pH), international normalized ratio (INR), and supplemental oxygen (FiO2). RESULTS/ANTICIPATED RESULTS: There were 66 eligible infants (38 males) and 1741 patient-days identified; Severe HIE (48%) and mortality (n=21, 32%) were common. Overall, the median length of stay (mLOS) was 20.5 days (25th–75th centile: 10–31 days), although shorter for nonsurvivors [nonsurvivors mLOS=8 days (5, 20); survivors mLOS=24 days (14, 31), p<0.001). Median birthweight and gestational age were 3.3 kg and 39.4 weeks’ gestation, respectively. In survivors (n=45, 1290 days), regression analyses demonstrated that none of the selected parameters were associated with LOS. Among nonsurvivors (n=21, 451 days), diastolic blood pressure changes [hazard ratio (HR)=0.93, 95% confidence interval (CI)=0.88, 0.97, p=0.04] was related to longer time of survival; conversely, temperature (HR=2.0, 95% CI=1.24, 3.26, p=0.005) was related to shorter survival. Creatinine, pH, INR, FiO2, or other vital signs were unrelated to time-to-death in nonsurvivors. DISCUSSION/SIGNIFICANCE OF IMPACT: In a pilot study of neonatal HIE, changes in physiologic values were related to duration of survival in nonsurvivors, while neither physiologic nor laboratory values were related to survivors’ mLOS. These results both exemplify novel uses for disease-specific, exposure-outcome relationships using EDWs and incorporates required functionalities of required software patches to extract, clean, and report from clinical information captured in electronic health records. We anticipate that text mining with techniques such as natural language processing will augment associations and/or predictions of short-term outcomes.
2240: High-throughput phenotyping and the increased risk of OSA in Rosacia patients
- Peter Elkin, Sarah Mullin, Sanjay Sethi, Shyamashree Sinha, Animesh Sinha
-
- Published online by Cambridge University Press:
- 10 May 2018, p. 13
-
- Article
-
- You have access Access
- Open access
- Export citation
-
OBJECTIVES/SPECIFIC AIMS: To create a new semantically correct high-throughput phenotyping (HTP) platform. To demonstrate the utility of the HTP platform for observational research and can allow clinical investigators to perform studies in 5 minutes. To demonstrate the improved accuracy of observational research using this platform when compared with traditional observational research methods. To demonstrate that patients who have Roseacea are at increased risk of having obstructive sleep apnea (OSA). METHODS/STUDY POPULATION: This population is a set of 212,343 patients in the outpatient setting cared for in the Buffalo area over a 6-year period. All records for these patients were included in the study. Structured data was imported into an OMOP (OHDSI) database and all of the notes and reports were parsed by our HTP system which produces SNOMED CT codes. Each code is designated as a positive, negative or uncertain assertion and compositional expressions are automatically generated. We store the codified data 750,000,000 codes in Berkley DB, a NOSQL database, and we keep the compositional graphs in both Neo4J and in GraphDB (a triple store). Labs are coded in LOINC and drugs using RxNorm. We have developed a Web interface in .Net named BMI Search, which allows real-time query by subject matter experts. We analyzed the accuracy of structured Versus unstructured data by identifiying NVAF cases with ICD9 codes and then looked for any additional cases based on the SNOMED CT encodings of the clinical record. This was validated by 2 clinical human review of a set of 300 randomly selected cases. Separately we ran a study to determine the relative risk of OSA with and without Rosacea using the data set described above. We compared the rates using a Pearson χ2 test. RESULTS/ANTICIPATED RESULTS: We are able to parse 7,000,000 records in an hour and a half on 1 node with 4 CPUs. This yielded 750,000,000 SNOMED CT codes. The HTP data set yielded 1849 cases using ICD9 codes and another 873 using the HTP-NLU data, leading to a final data set of 2722 cases from our population of 212,343 patients. In total, 580 patients had Rosacea;5443 patients had OSA without Rosacea and 51 patients had OSA with Rosacea. Patients with Rosaca had an 8.8% risk of OSA whereas patients without Rosacia only had a 2.6% risk of OSA. This was highly statistically significant with a p<0.0001 (Pearson χ2 test). The number needed to test was only 12. DISCUSSION/SIGNIFICANCE OF IMPACT: HTP can change how we do observational research and can lead to more accurate and more prolific investigation. This rapid turn around is part of what is necessary for both precision medicine and to create a learning health system. Patients with Rosacea are at increased risk of and should be screened for OSA.
2246: Characterization of resistant hypertension in a statewide electronic health record-based database (OneFlorida)
- Caitrin W. McDonough, William R. Hogan, Betsy Shenkman, Rhonda M. Cooper-DeHoff
-
- Published online by Cambridge University Press:
- 10 May 2018, p. 13
-
- Article
-
- You have access Access
- Open access
- Export citation
-
OBJECTIVES/SPECIFIC AIMS: Our objective is to create a Resistant Hypertension (RHTN) computable phenotype from electronic health record (EHR)-based data, and to determine the characteristics associated with RHTN within a large, diverse, EHR-based database. METHODS/STUDY POPULATION: The OneFlorida Clinical Research Consortium includes 10 unique health care systems providing care for approximately half of the state (48%, ~10 million). OneFlorida houses a Data Trust which contains longitudinal EHR data and claims data from these providers in a common format, the PCORnet common data model v3.0. For the current project, data from 5 health care systems were considered. All of the adult hypertension (HTN) patients with a HTN diagnosis from an outpatient encounter were extracted from the OneFlorida Data Trust. Additional data such as demographics, prescribing, and vitals information were also extracted. The RHTN computable phenotype was created by constructing a drug exposure variable that took into consideration the number of antihypertensive medications an individual was prescribed at any point in time over the course of the OneFlorida dataset. RHTN was defined as any blood pressure requiring four or more antihypertensive drugs, or uncontrolled blood pressure (≥140/90) on 3 antihypertensive drugs. RHTN cases had to meet the definition criteria twice during the data period, at least 30 days apart. All data extraction, computation phenotype coding, and statistical analyses were conducted using SQL or SAS. RESULTS/ANTICIPATED RESULTS: Our preliminary results show that there were n=342,026 adults with a HTN diagnosis from an outpatient visit in the data set. After the RHTN computable phenotype was constructed, n=11,670 RHTN cases were identified from the n=130,901 HTN individuals with all of the required variables in the data set (8.9% RHTN prevalence). In all, 55% of RHTN cases were Black or African American, compared with the total HTN population (25% Black/African American). RHTN cases also had more prescriptions for loop diuretics, centrally acting agents, α-blockers, and vasodilators compared with the total HTN population. Not surprisingly, the RHTN cases had 26% of the antihypertensive prescriptions in the data set, and the RHTN cases had fewer blood pressure readings that were in control (only 49.4% of readings <140/90). DISCUSSION/SIGNIFICANCE OF IMPACT: Overall, our preliminary data shows that it is possible to create the very complicated computable phenotype of RHTN within the OneFlorida Data Trust. We found that the RHTN prevalence in OneFlorida is 8.9% which is consistent with previous studies from NHANES. Although promising, these results require further validation of the computable phenotype and replication in other similar data sets in order to ascertain their true meaning. Once validated, the experience gained from this computable phenotype can be applied to many other phenotypes.
2278: Identifying causative mutations in Treacher Collins syndrome using iobio
- Alistair N. Ward, Matt Velinder, Chase Miller, Tony Di Sera, Yi Qiao, Dave Viskochil, Gabor Marth
-
- Published online by Cambridge University Press:
- 10 May 2018, pp. 13-14
-
- Article
-
- You have access Access
- Open access
- Export citation
-
OBJECTIVES/SPECIFIC AIMS: The objective of the study was 2-fold; to identify potentially deleterious alleles in a child with Treacher Collins syndrome, and; to demonstrate the value of the iobio analysis platform for intuitively and rapidly analyzing genomic data. METHODS/STUDY POPULATION: We used the iobio suite of web-based applications to analyze quality metrics for the sequencing data and called variants for the proband and his parents. We then visually interrogated variants in genes potentially associated with the syndrome in real-time, using the intuitive gene.iobio application. We sought high impact variants that demonstrated a predicted impact on the protein function, and were simultaneously at low allele frequency in the general human population. Variants were also compared against the ClinVar database of known mutations to identify variants that have already been associated with this, or related syndromes in the literature or clinical studies. Finally, the gene.iobio tool allows users to interrogate the primary sequencing data to ensure that no variants had been missed by the primary variant calling pipeline. This analysis pipeline was performed using intuitive web-based apps in real time, and consequently represents a system that is available to users that traditionally are excluded from these analyses. RESULTS/ANTICIPATED RESULTS: The iobio suite was used to rapidly assess data quality and interrogate genetic variants for a child with Treacher Collins syndrome. A compound heterozygote consisting of 2 missense alleles in the TCOF1 gene was identified as a compelling pathogenic allele, necessitating further functional investigation. The study helped validate the use of the intuitive iobio tools in such analyses, strengthening the case for greater involvement of medical professionals in data analysis. DISCUSSION/SIGNIFICANCE OF IMPACT: The performed analyses demonstrated that the whole genome sequencing data for the family being studied was of a very high quality, although 1 gene demonstrated a local region of almost zero coverage. This ensured that study conclusions can be presented with confidence. A variant associated with Treacher Collins syndrome 1 in ClinVar was uncovered in the TCOF1 gene, however, given it’s benign rating, this variant was not considered further. The most interesting candidate was a compound heterozygote, consisting of 2 missense mutations, also in the TCOF1 gene. These mutations occurred with allele frequencies of 22% and 8% in the general population, and additional molecular and functional studies are currently being pursued.
2286: HOME Cell 2.0. Extending i2b2 to support community health outcome monitoring and evaluation via web-accessible software
- William G. Adams, Michael Mendis, Shiby Thomas, David Center, Sara Curran
-
- Published online by Cambridge University Press:
- 10 May 2018, p. 14
-
- Article
-
- You have access Access
- Open access
- Export citation
-
OBJECTIVES/SPECIFIC AIMS: The primary objective of this effort is to develop and distribute an easy to use i2b2 component that is capable of evaluating diverse complex relationships for a wide variety of exposures and outcomes over time. In this manner we are able to leverage the unique design of the i2b2 database to support health services research, comparative effectiveness, and quality improvement using a single tool. Furthermore, our novel database redesign has the potential to provide user-friendly access to individual and group CHC data for CER. METHODS/STUDY POPULATION: For this project we used software experts, clinical informatics specialists, and the existing i2b2 open-source software to convert our legacy HOME Cell into a web-client version. The tool will be used to study health outcomes within a network of Boston based Community Health Centers and the largest safety-net hospital in New England, Boston Medical Center. RESULTS/ANTICIPATED RESULTS: The new web-client HOME Cell will allow i2b2 users to model virtually any exposure (including therapeutic interventions such as medications or tests) in i2b2 against any outcome accounting for complex temporal relationships and other factors. In addition we plan to use our new Community Health Center views to enhance our community engagement activities by allowing direct access to their data for our partners. DISCUSSION/SIGNIFICANCE OF IMPACT: Our project addresses multiple national priorities related to data sharing, clinical research informatics, and comparative effectiveness. The web-client version of the HOME Cell substantially improves our community’s access to HOME Cell functionality and is a novel, sharable resource for use within the CTSA/NCATS community. Our approach provides a new way to perform large-scale collaborative research without the need to actually move patient-level data and has demonstrated that CER, health services research, and quality measurement can share a common framework. In addition, and as demonstrated in our earlier pilot work, the HOME Cell also has the potential to support large-scale multivariate analyses in a distributed manner that does not require sharing of patient-level data. We believe our approach has great promise for supporting the reuse of clinical data for rapid, transparent, health outcome assessments on a national scale. Our efforts support multiple strategic goals including: (1) support for building national clinical and translational research capacity by enhancing a broadly adopted informatics tool (i2b2); (2) enhanced consortium-wide collaborations by offering a tool that can be easily shared within the CTSA network to support multi-institutional collaboration; and (3) improving the health of our communities by offering a tool that has the potential to provide new insights into health care processes and outcomes that could drive innovation and improvement activities.
2289: Will the Veteran Affairs (VA) electronic medical records (EMR) database reveal a signal that angiotensin II inhibiting medications ameliorate depression?
- David D. Maron, Marc Blackman, Richard Amdur, Thomas Mellman, Kathryn Sandberg
-
- Published online by Cambridge University Press:
- 10 May 2018, p. 14
-
- Article
-
- You have access Access
- Open access
- Export citation
-
OBJECTIVES/SPECIFIC AIMS: Angiotensin type 1 receptor blockers (ARBs) and angiotensin-converting enzyme inhibitors (ACEIs) are frequently prescribed for hypertension and associated cardiovascular and renal complications. In animal models, these drugs also reduce anxiety and depression. OBJECTIVE—to determine if Veteran Affairs (VA) clinical pharmacy data indicate a protective effect of ARBs and/or ACEIs for major depression. METHODS/STUDY POPULATION: Pharmacy records from nationwide VA electronic medical records (EMR) were extracted for patients prescribed ARBs, ACEIs, α-blockers, β-blockers, calcium channel blockers, or diuretics (n=4,081,359). Patients were excluded if: they had not received medications for 6 months with >70% coverage; were diagnosed with substance/alcohol abuse, dementia, psychosis, schizophrenia, or prescribed insulin. The study population was categorized as “ARB/ACEI” (A/A) or “Never ARB/ACEI” (NA/A). Using the Greedy Matching Algorithm, subjects were matched on a 1:1 ratio for sex and race over a 5 year age range resulting in 2 equal groups of n=1,350,236 each. Subjects were older (M=71.6, SD=12) and mostly men (97%). RESULTS/ANTICIPATED RESULTS: In the A/A Versus NA/A, respectively, the incidence of anti-depressant use was greater during (9.9% vs. 8.9%) but was lower after (11.8% vs. 12.2%) the study period. PHQ-2 scores (Mean±SD) were statistically lower, albeit similar, during (0.79±1.56 vs. 0.85±1.63) and after (1.00±1.73 vs. 1.07±1.79) the study period. DISCUSSION/SIGNIFICANCE OF IMPACT: These preliminary data suggest that inhibiting angiotensin II action does not provide a protective effect on major depression when compared with other classes of antihypertensive drugs. This study illustrates how “Big Data” may inform the design, or obviate the need, for large-scale randomized-controlled trials.
2293: Passive intracranial EEG-based localization of the central sulcus during sleep
- Rafeed Alkawadri
-
- Published online by Cambridge University Press:
- 10 May 2018, pp. 14-15
-
- Article
-
- You have access Access
- Open access
- Export citation
-
OBJECTIVES/SPECIFIC AIMS: To investigate the performance of a metric for passive localization of central sulcus. METHODS/STUDY POPULATION: We studied 7 patients with intractable epilepsy undergoing intra-cranial EEG (icEEG) monitoring at Yale, in whom central sulcus (CS) localization was obtained by standard methods. Our method takes advantage of inherent properties of the primary motor cortex (MC), which exhibits enhanced icEEG band-power and coherence across the CS. For each contact x we calculated the z-score of a composite power and synchrony value log10(px)x;cx, where px is sum of the root mean square of the icEEG in the high gamma band (80–115 Hz) for contact x over the 6–10 minutes of NREM sleep studied, and cx is the mean magnitude squared coherence in the same band using a 500-ms Hamming window between contact x and all other contacts. z-score values lower than threshold (th) were set to 0. Finally, we calculated a metric m=z/d, where d is the mean Euclidian distance of each contact from contacts with z scores greater than 0. The last step was implemented to emphasize local network activity. RESULTS/ANTICIPATED RESULTS: We report the results of a pilot study to test the performance of a new operator independent method for passive identification of CS with intractable epilepsy undergoing icEEG monitoring at Yale, in whom CS localization was obtained by standard methods. The sensori-motor (SM) cortex exhibited higher EEG-gamma power compared with non-SM cortex (p<0.0002). There was no significant difference between the motor/premotor and sensory cortex (p<0.47). CS was successfully localized in all patients with thresholds between 0.4 and 0.6. In 2 patients, knowledge of anatomy was needed to distinguish the MC from adjacent epileptic foci. The primary hand and leg motor areas exhibited the highest metric values consistently followed by the tongue motor area. Higher threshold values were very specific (94%) for the anterior bank of the CS but not sensitive. Intermediate threshold values achieved a reasonable trade-off (0.4: 89% specific and 70% sensitive). DISCUSSION/SIGNIFICANCE OF IMPACT: We present and successfully implement a rapid procedure for task-free and stimulation free localization of the central sulcus during sleep based on intrinsic electrophysiological properties of the primary motor strip which exhibits increased power and enhanced local connectivity. We successfully localized the central sulcus in all patients. When implementing appropriate thresholds, our proposed metric M is very specific for the anterior lip of the central sulcus which may make it ideal to identify this important anatomical landmark. Our method is sensitive for epileptogenic regions as well, therefore basic knowledge about central sulcus anatomy may be needed in cases where there is an epileptogenic lesion in the vicinity of the central suclus. Our method makes a few a priori assumptions: The regions around the central sulcus are adequately sampled and the occipital or parieto-occipital regions are not included in the analysis. In order for the method to function properly, nonsensori-MC should be sampled adequately as well. In the future, normative data could be generated for the composite product of connectivity×power which may replace within-patient z-scoring. Our method is rapid and can be implemented on short segments of ECoG data. The proposed method may be potentially used for identification of seeds in the motor cortex for subsequent network analysis and further studies may delineate its potential use in the operating room.
2296: Functional analysis of the cutaneous microbiome in psoriatic disease
- Di Yan, Hsin-Wen Chang, Rasnik Singh, Kevin Lai, Kristina Lee, Ladan Afifi, Xueyan Lu, Derya Ucmak, Susan Lynch
-
- Published online by Cambridge University Press:
- 10 May 2018, p. 15
-
- Article
-
- You have access Access
- Open access
- Export citation
-
OBJECTIVES/SPECIFIC AIMS: Psoriasis is one of the most common inflammatory diseases of the skin, affecting about 2%–3% of the US population. Despite its high prevalence, its pathogenesis remains poorly understood. The ability of the microbiome to modify host immunity and metabolism suggests that it may contribute to the development of psoriasis and its cardiometabolic comorbidities. This study aims to characterize the psoriatic skin microbiome and understand the functional role that these bacteria may play. METHODS/STUDY POPULATION: 16s rRNA sequencing of site-matched skin swabs from 8 psoriasis patients and 8 healthy controls was used to identify bacteria and determine their relative abundance and microbial community diversity in the sample. PICRUSt was used to infer the functional roles of the bacteria from 16s rRNA amplicon data. RESULTS/ANTICIPATED RESULTS: Lesional psoriasis skin had lower α diversity (p=0.04), less Actinobacteria (p=0.0001), but higher Firmicutes (p=0.009) compared with controls. At the genus level, lesional skin had more Alloiococcus (p=0.01) and Aerococcus (p=0.01) and demonstrated a trend towards lower Propionibacterium (p=0.08) and higher Gallicola (p=0.09) compared to controls. Interestingly, Alloiococcus (p=0.003) and Gallicola (p=0.04) were also higher in nonlesional skin compared with controls. Furthermore, lesional and nonlesional skin shared an increased abundance of Acinetobacter sp., Staphylococcus pettenkoferi, and Streptococcus sp., relative to controls. Lesional and nonlesional psoriasis skin did not differ significantly in microbiome composition. Predictive functional analysis revealed that both the healthy and psoriatic skin microbiome were enriched with bacteria capable of amino acid and carbohydrate metabolism suggest these functions might have a general role in host-microbe interaction. DISCUSSION/SIGNIFICANCE OF IMPACT: These data reveal intriguing differences in the cutaneous microbiome of psoriatic individuals and healthy controls and suggest that bacterial metabolism may play an important role in host-microbe interaction.
2327: Prescription opioid dependence in Western New York: Using data analytics to find an answer to the opioid epidemic
- Shyamashree Sinha, Gale Burstein, Kenneth E. Leonard, Timothy Murphy, Peter Elkin
-
- Published online by Cambridge University Press:
- 10 May 2018, p. 15
-
- Article
-
- You have access Access
- Open access
- Export citation
-
OBJECTIVES/SPECIFIC AIMS: Dependence and abuse of prescription opioid pain medication has substantially increased over the last decade. The consistent rise in opioid dependence contributes to the rising prescription drug overdose deaths over the last decade. The study of the distribution and determinants of opioid dependence among patients who are treated with chronic pain medications prescribed by their healthcare providers would aid in answering some key questions about potential abuse and overdose on opioids. The descriptive epidemiology of opioid dependence would help in identifying the vulnerable age group, race, ethnicity, and type of opioid pain medications that more commonly result in dependence. METHODS/STUDY POPULATION: We implemented an Observational Medical Outcomes Partnership/Observational Health Data Sciences and Informatics (OMOP/OHDSI) database, to hold structured EHR data from our Allscripts patient records. We also created a high-throughput phenotyping, natural language processing system that can parse 7,000,000 clinical notes in 1.5 hours. This runs as a web service and provides a modular component based NLP system. After the full semantic parse, we match the content against any number of ontologies. For each match we tag it as either a positive, negative, or uncertain assertion. We then perform automated compositional expressions. The codes are stored in a Berkley database (BDB) NOSQL database and the compositional expressions are stored in Neo4J (a graph database) and Graph DB (a triple store). This flexibility allows rapid retrieval of complex questions in real time. The High-Throughput Phenotyping (HTP) Natural Language Processing (NLP) Subsystem (HTP-NLP) is software that produces, given biomedical text, semantic annotations of the text. The semantic annotations identify conceptual entities—their attributes, the relations they have with other entities and the events they participate in, as expressed in the input text. The conceptual entities, relations, attributes, and events identified are specified by various knowledge representations (KRs) as documented in Coding Sources. Examples of coding sources are medical terminologies [eg, SNOMED CT, RxNorm, LOINC and open biomedical ontologies (OBO) foundry ontologies, eg, gene ontology (GO), functional model of anatomy, OBI, and others]. The annotation results may be displayed or output in formats suitable for further processing. Entity identified is assigned a truth value from 0 to 1. Values from the text are assigned to entities from ontologies such as SNOMED CT. The retrospective analysis of EHR data from local clinic patients was performed using queries on the problem list, demographic data, and medication list of all the patients in the database. The OMOP/OHDSI database was collected from Allscripts EHRs from 2010 to 2015. This common data model helps in the systematic analysis of disparate observational databases of clinic records from the primary care and family medicine clinics in Western New York region. The database contained 212,343 patient records that were parsed and deidentified. Specific research IDs were assigned to each of the patient records and stored in a secure firewall device for data analytics. The entire 212,343 records were queried for opioid dependence from the ICD-9 and 10 diagnostic codes and SNOMED CT codes mapped to both the clinical notes and the problem list for each patient based on the mapped ICD and SNOMED CT codes. In total, 1356 patients were identified as to having opioid dependence. The records were stratified into 7 age groups from age 18 to 28 and ending with age 79–89 years. RESULTS/ANTICIPATED RESULTS: Of the 212,343 patients in the database 1356 patients revealed opioid dependence on the problem list, ICD9-10 codes and prescription opioid pain medication with or without Buprenorphine and Naloxone (Suboxone) in the medication list. The prevalence of opioid dependence in the clinic population was 0.64% (95% CI: 0.61%–0.67%) over a 5-year period. The 7,000,000 patient records generated 750,000,000 SNOMED CT codes (on average 107 codes per record). The highest numbers of opioid dependence were seen in the 29 to 38 years’ age group. That comprised 39.38% (95% CI: 36.78%–41.98%) of the total opioid dependent population but accounted for only 2.03% of whole clinic population in this age group (95% CI: 1.86% to 2.2%). The subjects were then stratified by race and ethnicity. There were 1005 patients with opioid dependence, in the non-Hispanic population (total number 108,402). Among the White non-Hispanic or Latino population with opioid dependence, 41.33% (95% CI: 38.27%–44.39%) were 29–38 years old. The next common age group among the White Non-Hispanic opioid dependent subjects was 19–28 years, comprising of 22.48% (95% CI: 19.88%–25.08%) of the total number of White non-Hispanic or Latino opioid dependent population. Among the total clinic population Hispanics comprise 51.24%, but they comprise only 2.58% (95% CI: 1.74%–3.42%) of the total opioid dependent population. The non-Hispanic population comprise 51.05% of total clinic population while the percent of people who are opioid dependent is 83.26% (95% CI: 83.04%–83.48%) of the total 1356 opioid dependent population. DISCUSSION/SIGNIFICANCE OF IMPACT: The trends of opioid dependence among the clinic population in the study indicate that the prevalence is more in a certain section of the population. The predominance is among the non-Hispanic White population in the 19–38 years of age. The prevalence in younger age implies that the complications related to opioid dependence would be there for a longer duration of time. The prevalence of dependence in this clinic population would be rising if this trend continues. Interventions at curbing prescription opioid dependence is necessary for the vulnerable population. The findings suggest that a broad based approach is necessary to address this problem. The distribution of opioid dependence in this patient population indicate the need for special attention to these specific age group and race ethnicities. The young age of many of the addicted patients demonstrate the risks of legitimate opioid prescriptions in leading this age group towards addiction and implies the need for routine screening for substance abuse. The evidence of complications of opioid overdose among long-term opioid users and risk of abuse with other agents including illicit agents makes the need for an approach that uses real-time interventions in addition to effect long-term improvement in addiction rates. A potentially cost-effective approach to implement monitoring programs and clinical decision support tools would be to develop inter operable linkage from the EHRs to the state Department of Healths’ prescription monitoring programs.
2354: Pioneering the pathway with big data to eliminate hepatitis C viral infection (EHCV)
- Dawn A Fishbein, Ian Brooks, Emanuel Villa Baca, Ozgur Ozmen, Mallikarjun Shankar, Gil Weigand, Kristina Thiagarajan, Randy Estes, Alex Geboy, Hala Deeb, Mamta Jain, Lesley Miller
-
- Published online by Cambridge University Press:
- 10 May 2018, pp. 15-16
-
- Article
-
- You have access Access
- Open access
- Export citation
-
OBJECTIVES/SPECIFIC AIMS: Hepatitis C viral (HCV) infections are rising significantly both in young adults and as newly diagnosed cases in “baby boomers.” New HCV therapeutics cure over 95% of cases, and a call has been made for elimination of the epidemic by 2030; yet major HCV cascade of care (CoC) barriers exist. We secured CTSA pilot funding to obtain preliminary data for an innovative clinical trial utilizing big data modeling toward HCV elimination. METHODS/STUDY POPULATION: Our pilot work has developed a coordinated, real-time clinical data management process across 3 major CTSA affiliated hospital systems (MedStar Health, Emory-Grady, and UT-Southwestern), and additional data will be obtained from a pragmatic clinical trial. Electronic medical records data will be mapped to the OHDSI model, securely transmitted to Oak Ridge National Laboratory, Knoxville, TN and exposed to integrated data, analytics, modeling and simulation (IDAMS). RESULTS/ANTICIPATED RESULTS: Our U01 CTSA application proposes that HCV-IDAMS will model modifications to the established HCV CoC at community and population levels and thus simulate future outcomes. As data volume increases, system knowledge will expand and recursive applications of IDAMS will increase the accuracy of our models. This will reveal real-world reactions contingent upon population dynamics and composition, geographies, and local applications of the HCV CoC. DISCUSSION/SIGNIFICANCE OF IMPACT: Only an innovative, integrated approach harnessing pragmatic clinical data, big data and supercomputing power can create a realistic model toward HCV elimination.
2356: openSESAME: a “search engine” for discovering drug-disease connections by leveraging publicly available high-throughput experimental data
- Adam C. Gower, Avrum Spira, Marc E. Lenburg
-
- Published online by Cambridge University Press:
- 10 May 2018, p. 16
-
- Article
-
- You have access Access
- Open access
- Export citation
-
OBJECTIVES/SPECIFIC AIMS: Microarray technology has produced large volumes of gene expression data profiling differences in gene expression in a vast array of conditions, much of which is publicly available. Methods to query these data for similarities in patterns of gene regulation are limited to comparisons between preannotated groups. In response, we developed openSESAME to find experiments where a set of genes is similarly coregulated without regard to experimental design. An important application of openSESAME is drug repositioning: if a pattern associated with disease is reversed by a given drug, the drug might target disease-related processes. METHODS/STUDY POPULATION: Experiments from the Gene Expression Omnibus (GEO) were normalized, signature-association (SA) scores computed for each sample, experiments assigned enrichment scores, and ANOVAs used to assign significance to experimental variables automatically extracted from GEO. SA scores were also generated for hundreds of publicly available signatures, and pairwise correlations used to create a relevance network. RESULTS/ANTICIPATED RESULTS: Using signatures of estrogen and p63, we recovered relevant experimental variables, and with the network approach, we recovered previously reported associations between disease states and/or drug treatments. DISCUSSION/SIGNIFICANCE OF IMPACT: openSESAME has the potential to illuminate “dark data” and discover novel relationships between drugs and diseases on the basis of common patterns of differential gene expression.
2378: A scientometric analysis of CTSA collaboration and impact
- Kristi Holmes, Ehsan Mohammadi, Karen Gutzman, Pamela Shaw, Donald Lloyd-Jones
-
- Published online by Cambridge University Press:
- 10 May 2018, p. 16
-
- Article
-
- You have access Access
- Open access
- Export citation
-
OBJECTIVES/SPECIFIC AIMS: Translational science supports the continuum of activities from early-stage bench research to implementation of discoveries for better and faster treatments to more patients. Past studies have attempted to clarify our understanding of the spectrum of translational research by categorizing the activities into stages ranging from T0 to T4 using explanatory definitions. Unfortunately, this approach is often vague and relies on a process of manual classification and binning of research publications into predetermined categories. This study aims to provide a big-picture analysis of clinical and translational science (CTS) based on an in-depth analysis of the entire corpus of publications resulting from research funded by Clinical and Translational Science Awards (CTSA) U54 awards (through 2016). METHODS/STUDY POPULATION: We harvested bibliographic metadata from all papers that cited any of the U54 award numbers since the inception of the CTSA program to the most recent award announcement. Natural language processing techniques were used to create term co-occurrence networks based on English-language textual data. Relevant and nonrelevant terms were distinguished algorithmically and processed accordingly to provide the clustered visualization. RESULTS/ANTICIPATED RESULTS: With this approach, we uncovered 6 natural clustered areas of emphasis of published CTS research, the evolution of specific concepts through time, and gained a better understanding of their relative impact as demonstrated by citations. We performed additional analyses including discipline-specific impact assessment; identification of categories of excellence relating to both productivity and citations; characteristics of collaborative networks such as organizational, industry, and international collaborations and network dynamics; and resulting global impact of the CTSA program. DISCUSSION/SIGNIFICANCE OF IMPACT: Ultimately we gained a clearer understanding of the CTSA program, its evolution through scholarly publications, and key areas of impact of the program using computational, data-driven evaluation methods.
2412: Predicting response to hemodynamic interventions in the ICU using recurrent neural networks
- Julian Genkins, Thomas A. Lasko
-
- Published online by Cambridge University Press:
- 10 May 2018, pp. 16-17
-
- Article
-
- You have access Access
- Open access
- Export citation
-
OBJECTIVES/SPECIFIC AIMS: Our goal is to explore the value of learning algorithms to improve both the efficiency and accuracy of a clinician undertaking the cognitive task of selecting the best resuscitative intervention for a hemodynamically unstable patient in the ICU. Machine learning is an ideal discipline to solve this problem. The ICU is a data rich environment, however there is significant uncertainty regarding the interdependency of this data. Experts consistently struggle to develop deterministic models of the underlying forces driving hemodynamic perturbations and intervention responsiveness. Machine learning, especially deep learning, assumes no correlation between inputs. Deep architectures disentangle these high-level relationships through exposure to abundant, diverse data sets such as those used in this project, obviating the need to manually explore confounding interactions. METHODS/STUDY POPULATION: We are using the “Medical Information Mart for Intensive Care” (MIMIC-III) database for this project. MIMIC-III is a large, single-center database comprising information relating to patients admitted to critical care units at Beth Israel Deaconess Medical Center, a large tertiary care hospital, from 2001 to 2012. It contains data associated with 38,597 distinct adult patients and 53,423 distinct hospital admissions for those patients, with a mean of 4579 charted observations and 380 laboratory measurements available for each hospital admission. Classes of data in the MIMIC-III are varied and include billing, intervention, laboratory, medication, and physiologic data among others. In addition to training an RNN in the task of predicting hemodynamic states, we will also attempt to train 2 additional models on the same data—a multidimensional linear regression and a nonsequence-oriented deep neural network. For each of these models we will measure accuracy using root mean squared error (RMSE) and mean absolute error (MAE) to provide scale-dependent measurements of accuracy. RESULTS/ANTICIPATED RESULTS: Our results will be reported in 2 primary categories: numerical accuracy of the RNN model and applicability, utility, and accuracy in a live clinical setting. The use of RNNs in biomedical informatics, and in general, is a relatively new phenomenon. This means that the body of literature which could provide a basis for our expected results is limited. Because of this we have chosen staged goals in assessing our model. First, we hope to achieve a model that reliably predicts the direction of response. Being able to answer only the question of how a patient will respond—will they move toward or away from our therapeutic goal—is as good as existing prediction methods. It is well established in the literature that, by almost any metric, ~50% of hemodynamically unstable patients respond to a fluid challenge. If we are within 10% of this average (40%–60% respond), then we can be confident in the accuracy of our model in predicting direction. Upon achieving this, we will then measure accurate prediction of response magnitude. To this affect, we hope to achieve an RMSE <10% between our test data and corresponding predicted output before proceeding further. In addition to numeric accuracy, we acknowledge that a plan for practical, clinical validation is needed before utilizing this tool in a clinical environment. Such validation will require 3 separate components. First, numeric accuracy will need to be determined again as compared with prospective data on actual patients in the ICU. This step is critical to prove that no information leakage from target data back to input data occurred during training. Second, there must be a comparison to existing prediction methods, such as the passive leg raise in combination with measurement of cardiac output to predict volume responsiveness. Finally, we must measure the cost to the clinician of implementing our model in an ICU, specifically how it impacts their time to accomplish the task of selecting an intervention for the hemodynamically unstable patient. However, these tasks are beyond the scope of this project and will be left for later investigations. DISCUSSION/SIGNIFICANCE OF IMPACT: If we are successful, this study will provide the first step toward a data-driven model for predicting patient responsiveness to a given hemodynamic intervention or collection of interventions. As compared with current best practice maneuvers, this model will not require manipulation of the patient, have less rigid criteria for reliable interpretation, and not require as specific of a technical skillset to interpret. Furthermore, it will include many common categories of resuscitative therapies (eg, vasopressors, inotropes, fluids) and will allow effects of a combination of interventions to be predicted while making no assumptions of interdependence between said interventions. This study will also contribute a novel process of sequence prediction using RNNs by incorporating an element of context on top of the sequential data in every training step. An RNN learning the sequence of hemodynamic data comprising a patient’s hemodynamic state would, alone, fit into the realm of sequence prediction. Our innovation is the addition of treatment information with each temporal division of the hemodynamic data. The result is an RNN that combines the task of sequence prediction with sequence translation, the 2 major use cases for RNN learning algorithms.
2413: Immune stress biomarkers correlate to violence and internalization of violence in African American young adults
- Latifa Jackson, Max Shestov, Forough Saadatmand, Joseph Wright
-
- Published online by Cambridge University Press:
- 10 May 2018, p. 17
-
- Article
-
- You have access Access
- Open access
- Export citation
-
OBJECTIVES/SPECIFIC AIMS: Allostatic load, the chronic stress-induced wear and tear on the body, has a cumulative deleterious effect in individuals over their lifetime. Recent studies have suggested that socio-economic status, psychological determinants, and biomedical health cumulatively contribute to allostatic load in young adults. Although these finding individually suggest that African American children may be particularly susceptible to the effects of allostatic loading due to racially-based discrimination and economic instability, few studies have shown the effect of exposure to violence on the allostatic load carried by young African Americans. METHODS/STUDY POPULATION: The Biological and Social Correlates of Drug Use in African American Emerging Adults (BADU) data set is composed of young African Americans (n=557 individuals) living in the Washington, DC area, collected from 2010 to 2012. Study participants were sought equally between males and females (n=283, n=274, respectively). This data set provides a rich source of information on the behavioral, mental, and physical health of African American young adults (18–25 year olds) living in the Washington, DC area. Analysis of 6 biomedical markers were measured in BADU study participants: C-reactive protein, cortisol, Epstein-Barr virus IgG, IgE, IgA, and IgM, known to be markers of immune stress and allostatic load. Naive Bayes was used to identify participant responses that were correlated to elevated stress biomarker levels. RESULTS/ANTICIPATED RESULTS: Violence was most closely correlated to elevated EBVVCA IgM and IgE levels. Elevated IgE levels correlated to increased experience of familial violence and sexual abuse; familial drug abuse and depression; violence and community violence. Cortisol is positively correlated to reported emotional state (R=0.072) and perceived individual discrimination (R=0.059). DISCUSSION/SIGNIFICANCE OF IMPACT: Allostatic load appears to be high in individuals who self-report exposure to violence. Both perceived mental health and violence were correlated to elevated stress biomarkers. When Epstein-Barr virus viral capsid antigen IgM was compared with violence features characterized in the data set, we found that internalization of environmental stressors were most strongly correlated to elevated allostatic load markers. This work suggests that internalization of experienced violence may be as important as the actual violence experience.
2416: A machine learning pipeline to predict acute kidney injury (AKI) in patients without AKI in their most recent hospitalization
- Samuel Weisenthal, Samuel J. Weisenthal, Caroline Quill, Jiebo Luo, Henry Kautz, Samir Farooq, Martin Zand
-
- Published online by Cambridge University Press:
- 10 May 2018, pp. 17-18
-
- Article
-
- You have access Access
- Open access
- Export citation
-
OBJECTIVES/SPECIFIC AIMS: Our objective was to develop and evaluate a machine learning pipeline that uses electronic health record (EHR) data to predict acute kidney injury (AKI) during rehospitalization for patients who did not have an AKI episode in their most recent hospitalization. METHODS/STUDY POPULATION: The protocol under which this study falls was given exempt status by our institutional review board. The fully deidentified data set, containing all adult hospital admissions during a 2-year period, is a combination of administrative, laboratory, and pharmaceutical information. The administrative data set includes International Classification of Diseases, 9th Revision (ICD-9) diagnosis and procedure codes, Current Procedural Terminology, 4th Edition (CPT-4) procedure codes, diagnosis-related grouping (DRG) codes, locations visited in the hospital, discharge disposition, insurance, marital status, gender, age, ethnicity, and total length of stay. The laboratory data set includes bicarbonate, chloride, calcium, anion gap, phosphate, glomerular filtration rate, creatinine, urea nitrogen, albumin, total protein, liver function enzymes, and hemoglobin A1c. The pharmacy data set includes, for each medication, a description, pharmacologic class and subclass, and therapeutic class. Data preprocessing was performed using Python library Pandas (McKinney, 2011). Top-level binary representation (Singh, 2015) was used for diagnosis and procedure codes. Categorical variables were transformed via 1-hot encoding. Previous admissions were collapsed using rules informed by domain expertise (eg, the most recent age or sum of assigned diagnosis codes were retained as elements in the feature vector). We excluded any patient without at least 1 rehospitalization during the time window. We excluded any admission with or without AKI where AKI was also present in the most recent hospitalization. For comparison, we do not exclude such admissions for an identical experiment in which we considered any AKI event as a positive sample (regardless of AKI presence in the most recent hospitalization). We defined an AKI event as an assignment of any of the acute kidney failure (AKF) ICD-9 codes [584.5, AKF with lesion of tubular necrosis, 584.6, AKF with lesion of renal cortical necrosis, 584.7, AKF with lesion of renal medullary (papillary) necrosis, 584.8, AKF with other specified pathological lesion in kidney, or 584.9, AKF, unspecified]. Since diagnosis codes are believed to be specific but not sensitive for AKI (Waikar, 2006), we supplemented them using creatinine for patients who had laboratory values. Diagnosis was made according to the Kidney Disease: Improving Global Outcomes (KDIGO) Practice Guidelines (AKI defined as a 1.5-fold or greater increase in serum creatinine from baseline within 7 d or 0.3 mg/dL or greater increase in serum creatinine within 48 h). We report preliminary model discrimination via area under the receiver operating characteristic curve (AUC) using k-fold cross validation grouped by patient identifier (to ensure that admissions from the same patient would not appear in the training and validation set). It was confirmed that the prevalence of positive samples in the entire data set was maintained in each fold. Python library Sci-kit Learn (Pedregosa, 2011) was used for pipeline development, which consisted of imputation, scaling, and hyper-parameter tuning for penalized (l1 and l2 norm) logistic regression, random forest, and multilayer perceptron classifiers. All experiments were stored in IPython (Pérez, 2007) notebooks for easy viewing and result reproduction. RESULTS/ANTICIPATED RESULTS: There were 107,036 adult patients that accounted for 199,545 admissions during a 2-year window. Per admission, there were at most 54 ICD-9 diagnoses, 38 ICD-9 procedures, 314 CPT-4 procedures, and 25 hospital locations visited. The admissions were 55% female, the average age was 46±standard deviation 20, and average length of stay was 2.5±8.0 days. We excluded 2360 admissions that involved an AKI event that directly followed an admission with an AKI event and 4130 admissions that did not involve an AKI event but directly followed an admission with an AKI event. In total, there were 4561 (5.3%) positive samples (AKI during rehospitalization without AKI in the previous stay) generated by 3699 unique patients and 81,458 negative samples (non-AKI during rehospitalization without AKI in the previous stay) generated by 31,831 unique patients. When using any AKI event as a positive sample (regardless of whether or not AKI was in the most recent stay), the prevalence was 7.3% (6921 positive samples generated by 4395 unique patients and 85,588 negative samples generated by 33,287 unique patients). Best results were achieved with a code precision of 3 digits for which we had a total of 4556 features per patient. Fitted hyper-parameters corresponding to each classifier were logistic regression with l1 penalty C as 2×10−3; logistic regression with l2 penalty C as 1×10−6; random forest number of estimators as 100, maximum depth as 3, minimum samples per leaf as 50, minimum samples per split as 10, and entropy as the splitting criterion; and multilayer perceptron l2 regularization parameter α as 15, architecture as 1 hidden layer with 5 units, and learning rate as 0.001. Five-fold stratified cross validation on the development set yielded AUC for logistic regression with l1 penalty average 0.830±0.006, logistic regression with l2 penalty 0.796±0.007, random forest 0.828±0.007, and multilayer perceptron 0.841±0.005. In an identical experiment for which an AKI event was considered a positive sample regardless of AKI presence in the most recent stay, we had 4592 features per sample with the same code precision. Five-fold stratified cross validation on the development set with identical settings for the hyper-parameters yielded AUC for logistic regression with l1 penalty average 0.850±0.004, logistic regression with l2 penalty 0.819±0.006, random forest 0.853±0.004, and multilayer perceptron 0.853±0.006. DISCUSSION/SIGNIFICANCE OF IMPACT: Our objective was to investigate the feasibility of using machine learning methods on EHR data to provide a personalized risk assessment for “unexpected” AKI in rehospitalized patients. Preliminary model discrimination was good, suggesting that this approach is feasible. Such a model could aid clinicians to recognize AKI risk in unsuspicious patients. The authors recognize several limitations. Since our data set corresponds to a time-window sample, patients with high frequency of hospital utilization are likely overrepresented. Similarly, our data set contains records from only 1 hospital network. Although we supplement with laboratory-based diagnosis, using diagnosis codes as labels is problematic as numerous reports suggest low sensitivity of codes for AKI. Future work includes calibration analysis, incremental updating (“online learning”), and a representation learning-based (“deep learning”) extension of the model.
2456: Genetic determinants of recovery after mild traumatic brain injury: Can study samples be identified from electronic medical records linked to DNA biobanks?
- Jessica Dennis, Scott Zuckerman, Aaron Yengo-Kahn, Nancy Cox, Gary Solomon
-
- Published online by Cambridge University Press:
- 10 May 2018, p. 18
-
- Article
-
- You have access Access
- Open access
- Export citation
-
OBJECTIVES/SPECIFIC AIMS: To develop an algorithm that identifies post-concussion syndrome (PCS) cases and controls from among patients with mild traumatic brain injury (mTBI) in a large academic biobank. METHODS/STUDY POPULATION: The Vanderbilt University Medical Center’s (VUMC) electronic medical record (EMR) research database includes longitudinal medical record data on 2.5 million people. DNA and genotype data were also available for >225,000 of these individuals. Our algorithm used a combination of billing codes and natural language processing to apply inclusion and exclusion criteria. We defined PCS cases as those with a PCS billing code (ICD-9 310.2 or ICD-10 F07.81) and/or symptoms of PCS within 1–6 months of a qualifying mTBI. We will compare the positive predictive value of our algorithm to that of 2 simpler case selection schemes: (1) 1 instance of the PCS billing code anywhere in the medical record; and (2) 2 or more instances of the PCS billing code anywhere in the medical record. RESULTS/ANTICIPATED RESULTS: An mTBI was diagnosed in 28,720 patients regularly attending VUMC, and 528 of these patients were classified as PCS cases by our algorithm. The characteristics of our EMR sample reflected known risk factors for PCS. Our cases were more likely than controls to be female (49.4% vs. 38.4%), to have sustained a previous TBI (31.0% vs. 12.0%) and to have comorbid mood disorders. Our PCS cases were also more likely than controls to be <18 years of age (42.4% vs. 33.6%) and to have a sports-related keyword associated with the mTBI (44.1% vs. 25.2%), emphasizing the relevance of PCS to young athletes. Nonetheless, the number of PCS cases identified by our algorithm was small, and within the VUMC EMR, there were 5039 patients with 1 PCS billing code, and 2457 patients with 2 or more PCS billing codes anywhere in their EMR. Our next step is to calculate the positive predictive values of each selection scheme by manually reviewing the EMR of a selection of cases. Ultimately, we will implement the selection scheme that maximizes both positive predictive value and sample size, and in future work, we will genotype the selected patients to better understand the genetic architecture of PCS. DISCUSSION/SIGNIFICANCE OF IMPACT: EMR and biobanks are the future of human health research, and we asked whether complex algorithms or simple billing codes were best for studying the genetics of recovery after mTBI within the VUMC EMR. Our results are relevant to other studies of brain injury phenotypes within biobanks, including recovery from moderate or severe TBI, recovery from stroke, or the occurrence of delirium after routine surgery, and will help transform biobanks into fruitful research tools.
2465: The design of a patient-centered personal health record with patients as co-designers
- Arlene Chung, Haiwei Chen, Grace Shin, Ketan Mane, Hye-Chung Kum
-
- Published online by Cambridge University Press:
- 10 May 2018, p. 18
-
- Article
-
- You have access Access
- Open access
- Export citation
-
OBJECTIVES/SPECIFIC AIMS: The promise and potential of connected personal health records (PHRs) has not come to fruition. This may be, in part, due to the lack of user-centered design and of a patient-centric approach to curating personal health data for use by patients. Co-design with end-users could help mitigate these issues by ensuring the software meets user’s needs, and also engages patients in informatics research. Our team partnered with patients with multiple chronic conditions to co-design a patient-centric PHR. This abstract will describe our experience with the co-design process, highlight functionalities desired by patients, and showcase the final prototype. METHODS/STUDY POPULATION: We conducted 3 design sessions (90 min per session) with patients as co-designers and employed an iterative process for software development. Patients were recruited from Chapel Hill and surrounding areas. The initial design session laid the foundation for future sessions, and began with brainstorming about what patients thought their ideal version of an engaging connected PHR would look like in terms of features and functionalities. After each software iteration, our entire design team, including our patient co-designers, was shown the prototype during a subsequent design session. Once the final prototype was developed, usability testing was conducted with patient participants. Our team then conducted a final design session to debrief about the final prototype. RESULTS/ANTICIPATED RESULTS: We started with an initial group of 12 patients (6 males) who all had diabetes and an additional comorbidity such as hypertension and hyperlipidemia. Age of participants ranged from 30 to 77 years with an average age of 56. The majority of participants were Caucasian with 1 Asian and 2 African Americans. Hemoglobin A1c values ranged from 6.0% to 9.2% with approximately half having A1c values less than the goal of 7.0%. Half the patients were aware of PHRs, majority had smartphones, and all participants had access to the Internet and used email. Two of the patients were retired engineers who had prior experience with software design. The other sessions had between 7 and 8 participants at each session, and 7 patients completed the 90-minute usability testing session. There was a core group of 7 patients who were engaged in the design and testing sessions throughout the entire 9-month study. Key features of the PHR that emerged from design sessions included the following: (1) allow for annotation of data by patients (particularly important for lab values like glucose or for physical activity); (2) calendars, to do list, and reminder functions should be linked so that an entry in one of these allows for auto-population of this data within the other sections; (3) notifications whenever new data from the electronic health record or other sources are pushed to the PHR account; (4) allow for drag and drop of photos of pills/medications taken via smartphone or from other sources so that medication list has photo of actual pills or pill bottle; (5) allow for patients to customize the order of sections in the PHR dashboard so that the sections most important to the individual patient can be displayed more prominently; (6) allow for notifications from pharmacies to be pushed to the PHR (eg, confirmation of receipt of prescription requests or alert that prescription is ready to pick up); and (7) graphical display of trends over time (patients would like to select the measures and time frames to plot for display). Patients cited the importance of data provenance so that patient-entered data Versus provider or electronic health record data could be easily differentiated. Patients also highlighted the importance of having this PHR be a “one-stop shop for all their health data” and to have meaningful data dashboards for the different types of information needed to comprehensively manage their health. Patients wished for a single PHR that could easily bring together data from multiple patient portal accounts to avoid having to manage multiple accounts and passwords. They felt that heat map displays such as those used on popular fitness tracking websites were not intuitive and that the color-coding made interpretation challenging. Participants noted that engagement in the design process made them feel that they contributed towards developing software that could not only positively impact them individually but others as well. Every patient indicated the desire to participate on future design projects. Of the 19 tasks evaluated during usability testing, only 5 tasks could not be completed (eg, adding exercise to the calendar, opening the heat map, etc.). Patients felt that the overall PHR design was clean and aesthetically pleasing. Most patients felt that the site was “pretty easy to use” (6 out of 7). The majority of participants would like to use this PHR in the future (5) and would recommend this PHR to their friends/family to use (6). DISCUSSION/SIGNIFICANCE OF IMPACT: Involving patients directly in the design process for creating a patient-centric connected PHR was essential to sustaining engagement throughout the software life cycle and to informing the design of features and functionalities desired by patients with chronic conditions.
2469: Streamlining study design and statistical analysis for quality improvement and research reproducibility
- Ram Gouripeddi, Mollie Cummins, Randy Madsen, Bernie LaSalle, Andrew Middleton Redd, Angela Paige Presson, Xiangyang Ye, Julio C. Facelli, Tom Green, Steve Harper
-
- Published online by Cambridge University Press:
- 10 May 2018, pp. 18-19
-
- Article
-
- You have access Access
- Open access
- Export citation
-
OBJECTIVES/SPECIFIC AIMS: Key factors causing irreproducibility of research include those related to inappropriate study design methodologies and statistical analysis. In modern statistical practice irreproducibility could arise due to statistical (false discoveries, p-hacking, overuse/misuse of p-values, low power, poor experimental design) and computational (data, code and software management) issues. These require understanding the processes and workflows practiced by an organization, and the development and use of metrics to quantify reproducibility. METHODS/STUDY POPULATION: Within the Foundation of Discovery – Population Health Research, Center for Clinical and Translational Science, University of Utah, we are undertaking a project to streamline the study design and statistical analysis workflows and processes. As a first step we met with key stakeholders to understand the current practices by eliciting example statistical projects, and then developed process information models for different types of statistical needs using Lucidchart. We then reviewed these with the Foundation’s leadership and the Standards Committee to come up with ideal workflows and model, and defined key measurement points (such as those around study design, analysis plan, final report, requirements for quality checks, and double coding) for assessing reproducibility. As next steps we are using our finding to embed analytical and infrastructural approaches within the statisticians’ workflows. This will include data and code dissemination platforms such as Box, Bitbucket, and GitHub, documentation platforms such as Confluence, and workflow tracking platforms such as Jira. These tools will simplify and automate the capture of communications as a statistician work through a project. Data-intensive process will use process-workflow management platforms such as Activiti, Pegasus, and Taverna. RESULTS/ANTICIPATED RESULTS: These strategies for sharing and publishing study protocols, data, code, and results across the spectrum, active collaboration with the research team, automation of key steps, along with decision support. DISCUSSION/SIGNIFICANCE OF IMPACT: This analysis of statistical methods and process and computational methods to automate them ensure quality of statistical methods and reproducibility of research.
2476: Identifying strangulated small bowel obstruction with machine learning
- Samuel David Zetumer, Hobart Harris
-
- Published online by Cambridge University Press:
- 10 May 2018, p. 19
-
- Article
-
- You have access Access
- Open access
- Export citation
-
OBJECTIVES/SPECIFIC AIMS: Historically, logistic regression algorithms (LRAs) have failed to differentiate strangulated small bowel obstructions (SBOs) from nonstrangulated SBOs. Our hypothesis is that a machine learning algorithm (MLA) can differentiate strangulated from simple SBOs better than an LRA can. METHODS/STUDY POPULATION: We used records of patients presenting with acute SBO and managed with exploratory laparotomy to test and train algorithms. We compared MLA to LRA via area under the receiver operating characteristic curve (AUROC) and cut-off points maximizing sensitivity and specificity. RESULTS/ANTICIPATED RESULTS: With 192 patient records, the AUROC of the MLA was 0.85. At the sensitivity cutoff, the MLA had 100% sensitivity and 55% specificity. At the specificity cutoff, the MLA had 45% sensitivity and 100% specificity. We anticipate improvements as more records are incorporated, and that LRA will underperform MLA across all measures. DISCUSSION/SIGNIFICANCE OF IMPACT: Our MLA represents a significant improvement over past LRAs, and may provide decision assistance to surgeons managing SBO. If this MLA maintains its high sensitivity, it may be used in the future to prevent unnecessary surgeries.