Skip to main content
Log in

A Case Study of the Incremental Utility for Disease Identification of Natural Language Processing in Electronic Medical Records

  • Short Communication
  • Published:
Pharmaceutical Medicine Aims and scope Submit manuscript

Abstract

Background

Information exists as unstructured medical text in healthcare databases. Such information is not routinely considered in safety surveillance but typically relies solely on structured (coded) data. Natural language processing (NLP) may allow the capture of concepts from unstructured data and thus enhance safety surveillance capability.

Objectives

We sought to assess the added contribution of unstructured data extracted from medical text by NLP for detecting acute liver dysfunction (ALD) in patients with inflammatory bowel disease (IBD).

Methods

Using a previously developed rule, we evaluated structured and unstructured NLP-extracted terms from a commercially available electronic medical record (EMR) system. The rule was intended to identify ALD diagnosis and timing of onset and was the result of three iterations of rule development using 150 ALD candidate cases. We evaluated the performance of the rule with or without NLP among all candidate cases and among 50 new cases with clinical adjudication.

Results

NLP terms were necessary for the diagnosis of 9% of cases and for ruling out 3% of false-positive cases. Inclusion of NLP terms led to an identification of an additional  9% of ALD-onset dates, with consequent earlier recognition in 5%.

Conclusions

NLP-derived terms in one large commercially available EMR system modestly improved the sensitivity and specificity in the identification of ALD and identified earlier onset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

References

  1. Ananthakrishnan AN, Cai T, Savova G, Cheng SC, Chen P, Perez RG, et al. Improving case definition of Crohn’s disease and ulcerative colitis in electronic medical records using natural language processing: a novel informatics approach. Inflamm Bowel Dis. 2013;19(7):1411–20.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Carrell DS, Halgrim S, Tran DT, Buist DS, Chubak J, Chapman WW, et al. Using natural language processing to improve efficiency of manual chart abstraction in research: the case of breast cancer recurrence. Am J Epidemiol. 2014;179(6):749–58.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Dublin S, Baldwin E, Walker RL, Christensen LM, Haug PJ, Jackson ML, et al. Natural Language Processing to identify pneumonia from radiology reports. Pharmacoepidemiol Drug Saf. 2013;22(8):834–41.

    Article  PubMed  Google Scholar 

  4. Elkin PL, Froehling DA, Wahner-Roedler DL, Brown SH, Bailey KR. Comparison of natural language processing biosurveillance methods for identifying influenza from encounter notes. Ann Intern Med. 2012;156(1 Pt 1):11–8.

    Article  PubMed  Google Scholar 

  5. Haerian K, Varn D, Vaidya S, Ena L, Chase HS, Friedman C. Detection of pharmacovigilance-related adverse events using electronic health records and automated methods. Clin Pharmacol Ther. 2012;92(2):228–34.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Li L, Chase HS, Patel CO, Friedman C, Weng C. Comparing ICD9-encoded diagnoses and NLP-processed discharge summaries for clinical trials pre-screening: a case study. AMIA Annu Symp Proc. 2008;06:404–8.

    Google Scholar 

  7. Liao KP, Cai T, Gainer V, Goryachev S, Zeng-treitler Q, Raychaudhuri S, et al. Electronic medical records for discovery research in rheumatoid arthritis. Arthritis Care Res (Hoboken). 2010;62(8):1120–7.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Melton GB, Hripcsak G. Automated detection of adverse events using natural language processing of discharge summaries. J Am Med Inform. 2005;12(4):448–57.

    Article  Google Scholar 

  9. Murff HJ, FitzHenry F, Matheny ME, Gentry N, Kotter KL, Crimin K, et al. Automated identification of postoperative complications within an electronic medical record using natural language processing. JAMA. 2011;306(8):848–55.

    Article  CAS  PubMed  Google Scholar 

  10. Penz JF, Wilcox AB, Hurdle JF. Automated identification of adverse events related to central venous catheters. J Biomed Inform. 2007;40(2):174–82.

    Article  PubMed  Google Scholar 

  11. Afzal N, Sohn S, Abram S, Scott CG, Chaudhry R, Liu H, et al. Mining peripheral arterial disease cases from narrative clinical notes using natural language processing. J Vasc Surg. 2017;65(6):1753–61.

    Article  PubMed  Google Scholar 

  12. Wang Y, Wang L, Rastegar-Mojarad M, Liu S, Shen F, Liu H. Systematic analysis of free-text family history in electronic health record. AMIA Jt Summits Transl Sci Proc. 2017;2017:104–13.

    PubMed  PubMed Central  Google Scholar 

  13. Walker AM, Zhou X, Ananthakrishnan AN, Weiss LS, Shen R, Sobel RE, et al. Computer-assisted expert case definition in electronic health records. Int J Med Inform. 2016;86:62–70.

    Article  PubMed  Google Scholar 

  14. Wallace PJ, Shah ND, Dennen T, Bleicher PA, Crown WH. Optum Labs: building a novel node in the learning health care system. Health Aff (Millwood). 2014;33(7):1187–94.

    Article  PubMed  Google Scholar 

  15. Sadosky A, Mardekian J, Parsons B, Hopps M, Bienen EJ, Markman J. Healthcare utilization and costs in diabetes relative to the clinical spectrum of painful diabetic peripheral neuropathy. J Diabetes Complicat. 2015;29(2):212–7.

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lisa S. Weiss.

Ethics declarations

All patient and provider information was provided in the form of non-identifying study code numbers. The work did not require institutional review board approval.

Funding

This work was conducted using Pfizer, Inc., internal funds and under a research contract between Pfizer and World Health Information Science Consultants (AW and AA).

Conflicts of Interest

LSW, XZ, RS, RES, AB and RR are employees and may be shareholders of Pfizer, Inc. AW has worked under contract with Optum, which owns Humedica (whose data resource is being studied). AA has received consulting fees or honoraria for serving on scientific advisory boards for Abbvie, Takeda, and Merck. The views expressed herein are those of the authors and do not necessarily represent those of Pfizer, Inc.

Appendices

Appendix 1. Clinical Terms Relevant for Determination of ALD used for Guiding the Extraction from the Clinical Notes

Abdomen, abuse, alcoholic, addiction, alcohol, anorexia, appetite, ascites, asterixis, cirrhosis, confusion, dark urine, white stool, dependence, encephalopathy, fatigue, fetor, hepatitis, icterus, itch, jaundice, liver, malaise, mentation, nausea, pruritus, transaminases, and names of specific tests of liver function, infection or inflammation.

Appendix 2

Qualifying criteria for the surveillance population

Inclusion—adults with IBD

 Individuals aged ≥ 18 years with an ICD-9 code from the following list in the structured data on two different days between 1 January 2007 and 31 December 2012.

 555 Regional enteritis

 556 Ulcerative colitis

Exclusion—chronic liver disease

 Remove any individuals who have an ICD-9 code from the following list in the structured data.

 Infections

  070 viral hepatitis except 070.1 viral hepatitis A

 Chronic liver disease, cirrhosis and predisposing factors

121.1 clonorchiasis

  291x alcohol-induced mental disorders

  303x alcohol dependence syndrome

  304x drug dependence

  456.0, 456.1, 456.2 esophageal varices

  571.0, 571.1, 571.2, 571.3, 571.4, 571.5, 571.6, 571.7 chronic liver disease and cirrhosis. (Omits 571.8 and 571.9, which may refer to fatty liver, without mention of alcohol.)

  572x liver abscess and sequelae of chronic liver disease

 Malignant neoplasms that affect or commonly metastasize to liver

  150x–159x digestive organs and peritoneum

  160x–165x respiratory and intrathoracic organs

  172x malignant melanoma of skin

  174x–175x female and male breast

Utilization—evidence of continued care

 Observation time. Patients must have an interval from first recorded to last recorded visit of any type of at least 183 days.

 Visits. Patients must have at least two outpatient visits per 365 days of observation time.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Weiss, L.S., Zhou, X., Walker, A.M. et al. A Case Study of the Incremental Utility for Disease Identification of Natural Language Processing in Electronic Medical Records. Pharm Med 32, 31–37 (2018). https://doi.org/10.1007/s40290-017-0216-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40290-017-0216-4

Navigation