Abstract
Background
Information exists as unstructured medical text in healthcare databases. Such information is not routinely considered in safety surveillance but typically relies solely on structured (coded) data. Natural language processing (NLP) may allow the capture of concepts from unstructured data and thus enhance safety surveillance capability.
Objectives
We sought to assess the added contribution of unstructured data extracted from medical text by NLP for detecting acute liver dysfunction (ALD) in patients with inflammatory bowel disease (IBD).
Methods
Using a previously developed rule, we evaluated structured and unstructured NLP-extracted terms from a commercially available electronic medical record (EMR) system. The rule was intended to identify ALD diagnosis and timing of onset and was the result of three iterations of rule development using 150 ALD candidate cases. We evaluated the performance of the rule with or without NLP among all candidate cases and among 50 new cases with clinical adjudication.
Results
NLP terms were necessary for the diagnosis of 9% of cases and for ruling out 3% of false-positive cases. Inclusion of NLP terms led to an identification of an additional 9% of ALD-onset dates, with consequent earlier recognition in 5%.
Conclusions
NLP-derived terms in one large commercially available EMR system modestly improved the sensitivity and specificity in the identification of ALD and identified earlier onset.
References
Ananthakrishnan AN, Cai T, Savova G, Cheng SC, Chen P, Perez RG, et al. Improving case definition of Crohn’s disease and ulcerative colitis in electronic medical records using natural language processing: a novel informatics approach. Inflamm Bowel Dis. 2013;19(7):1411–20.
Carrell DS, Halgrim S, Tran DT, Buist DS, Chubak J, Chapman WW, et al. Using natural language processing to improve efficiency of manual chart abstraction in research: the case of breast cancer recurrence. Am J Epidemiol. 2014;179(6):749–58.
Dublin S, Baldwin E, Walker RL, Christensen LM, Haug PJ, Jackson ML, et al. Natural Language Processing to identify pneumonia from radiology reports. Pharmacoepidemiol Drug Saf. 2013;22(8):834–41.
Elkin PL, Froehling DA, Wahner-Roedler DL, Brown SH, Bailey KR. Comparison of natural language processing biosurveillance methods for identifying influenza from encounter notes. Ann Intern Med. 2012;156(1 Pt 1):11–8.
Haerian K, Varn D, Vaidya S, Ena L, Chase HS, Friedman C. Detection of pharmacovigilance-related adverse events using electronic health records and automated methods. Clin Pharmacol Ther. 2012;92(2):228–34.
Li L, Chase HS, Patel CO, Friedman C, Weng C. Comparing ICD9-encoded diagnoses and NLP-processed discharge summaries for clinical trials pre-screening: a case study. AMIA Annu Symp Proc. 2008;06:404–8.
Liao KP, Cai T, Gainer V, Goryachev S, Zeng-treitler Q, Raychaudhuri S, et al. Electronic medical records for discovery research in rheumatoid arthritis. Arthritis Care Res (Hoboken). 2010;62(8):1120–7.
Melton GB, Hripcsak G. Automated detection of adverse events using natural language processing of discharge summaries. J Am Med Inform. 2005;12(4):448–57.
Murff HJ, FitzHenry F, Matheny ME, Gentry N, Kotter KL, Crimin K, et al. Automated identification of postoperative complications within an electronic medical record using natural language processing. JAMA. 2011;306(8):848–55.
Penz JF, Wilcox AB, Hurdle JF. Automated identification of adverse events related to central venous catheters. J Biomed Inform. 2007;40(2):174–82.
Afzal N, Sohn S, Abram S, Scott CG, Chaudhry R, Liu H, et al. Mining peripheral arterial disease cases from narrative clinical notes using natural language processing. J Vasc Surg. 2017;65(6):1753–61.
Wang Y, Wang L, Rastegar-Mojarad M, Liu S, Shen F, Liu H. Systematic analysis of free-text family history in electronic health record. AMIA Jt Summits Transl Sci Proc. 2017;2017:104–13.
Walker AM, Zhou X, Ananthakrishnan AN, Weiss LS, Shen R, Sobel RE, et al. Computer-assisted expert case definition in electronic health records. Int J Med Inform. 2016;86:62–70.
Wallace PJ, Shah ND, Dennen T, Bleicher PA, Crown WH. Optum Labs: building a novel node in the learning health care system. Health Aff (Millwood). 2014;33(7):1187–94.
Sadosky A, Mardekian J, Parsons B, Hopps M, Bienen EJ, Markman J. Healthcare utilization and costs in diabetes relative to the clinical spectrum of painful diabetic peripheral neuropathy. J Diabetes Complicat. 2015;29(2):212–7.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
All patient and provider information was provided in the form of non-identifying study code numbers. The work did not require institutional review board approval.
Funding
This work was conducted using Pfizer, Inc., internal funds and under a research contract between Pfizer and World Health Information Science Consultants (AW and AA).
Conflicts of Interest
LSW, XZ, RS, RES, AB and RR are employees and may be shareholders of Pfizer, Inc. AW has worked under contract with Optum, which owns Humedica (whose data resource is being studied). AA has received consulting fees or honoraria for serving on scientific advisory boards for Abbvie, Takeda, and Merck. The views expressed herein are those of the authors and do not necessarily represent those of Pfizer, Inc.
Appendices
Appendix 1. Clinical Terms Relevant for Determination of ALD used for Guiding the Extraction from the Clinical Notes
Abdomen, abuse, alcoholic, addiction, alcohol, anorexia, appetite, ascites, asterixis, cirrhosis, confusion, dark urine, white stool, dependence, encephalopathy, fatigue, fetor, hepatitis, icterus, itch, jaundice, liver, malaise, mentation, nausea, pruritus, transaminases, and names of specific tests of liver function, infection or inflammation.
Appendix 2
Qualifying criteria for the surveillance population
Inclusion—adults with IBD Individuals aged ≥ 18 years with an ICD-9 code from the following list in the structured data on two different days between 1 January 2007 and 31 December 2012. 555 Regional enteritis 556 Ulcerative colitis Exclusion—chronic liver disease Remove any individuals who have an ICD-9 code from the following list in the structured data. Infections 070 viral hepatitis except 070.1 viral hepatitis A Chronic liver disease, cirrhosis and predisposing factors 121.1 clonorchiasis 291x alcohol-induced mental disorders 303x alcohol dependence syndrome 304x drug dependence 456.0, 456.1, 456.2 esophageal varices 571.0, 571.1, 571.2, 571.3, 571.4, 571.5, 571.6, 571.7 chronic liver disease and cirrhosis. (Omits 571.8 and 571.9, which may refer to fatty liver, without mention of alcohol.) 572x liver abscess and sequelae of chronic liver disease Malignant neoplasms that affect or commonly metastasize to liver 150x–159x digestive organs and peritoneum 160x–165x respiratory and intrathoracic organs 172x malignant melanoma of skin 174x–175x female and male breast Utilization—evidence of continued care Observation time. Patients must have an interval from first recorded to last recorded visit of any type of at least 183 days. Visits. Patients must have at least two outpatient visits per 365 days of observation time. |
Rights and permissions
About this article
Cite this article
Weiss, L.S., Zhou, X., Walker, A.M. et al. A Case Study of the Incremental Utility for Disease Identification of Natural Language Processing in Electronic Medical Records. Pharm Med 32, 31–37 (2018). https://doi.org/10.1007/s40290-017-0216-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40290-017-0216-4