Skip to main content
Log in

MedSTS: a resource for clinical semantic textual similarity

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

The adoption of electronic health records (EHRs) has enabled a wide range of applications leveraging EHR data. However, the meaningful use of EHR data largely depends on our ability to efficiently extract and consolidate information embedded in clinical text where natural language processing (NLP) techniques are essential. Semantic textual similarity (STS) that measures the semantic similarity between text snippets plays a significant role in many NLP applications. In the general NLP domain, STS shared tasks have made available a huge collection of text snippet pairs with manual annotations in various domains. In the clinical domain, STS can enable us to detect and eliminate redundant information that may lead to a reduction in cognitive burden and an improvement in the clinical decision-making process. This paper elaborates our efforts to assemble a resource for STS in the medical domain, MedSTS. It consists of a total of 174,629 sentence pairs gathered from a clinical corpus at Mayo Clinic. A subset of MedSTS (MedSTS_ann) containing 1068 sentence pairs was annotated by two medical experts with semantic similarity scores of 0–5 (low to high similarity). We further analyzed the medical concepts in the MedSTS corpus, and tested four STS systems on the MedSTS_ann corpus. In the future, we will organize a shared task by releasing the MedSTS_ann corpus to motivate the community to tackle the real world clinical problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. http://research.microsoft.com/en-us/downloads/607d14d9-20cd-47e3-85bc-a2f65cd28042/.

  2. http://research.microsoft.com/en-us/downloads/38cf15fd-b8df-477e-a4e4-a4680caa75af/.

  3. http://www.statmt.org/wmt08/shared-evaluation-task.html.

  4. http://stackexchange.com/.

  5. https://www.nlm.nih.gov/research/umls/.

References

  • Afzal, N., Wang, Y., & Liu, H. (2016). MayoNLP at SemEval-2016 Task 1: Semantic textual similarity based on lexical semantic net and deep learning semantic model. In Proceedings of SemEval (pp.674-679).

  • Agirre, E., Banea, C., Cardie, C., Cer, D., Diab, M. Gonzalez-Agirre, A., et al. (2014). Semeval-2014 task 10: Multilingual semantic textual similarity. In Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014).

  • Agirre, E., Banea, C., Cardiec, C., Cerd, D., Diabe, M., Gonzalez-Agirre, A., et al. (2015). Semeval-2015 task 2: Semantic textual similarity, English, Spanish and pilot on interpretability. In Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015).

  • Agirre, E., Banea, C., Cerd, D., Diabe, M., Gonzalez-Agirre, A., Mihalceab, R., et al. (2016). Semeval-2016 task 1: Semantic textual similarity, monolingual and cross-lingual evaluation. In Proceedings of SemEval (pp. 497–511).

  • Agirre, E., Cer, D., Diab, M., Gonzalez-Agirre, A., & Guo, W. (2013). SEM 2013 shared task: Semantic textual similarity, including a pilot on typed-similarity. In SEM 2013: The second joint conference on lexical and computational semantics. Citeseer. Philadelphia: Association for Computational Linguistics.

  • Agirre, E., Diab, M., Cer, D., & Gonzalez-Agirre, A. (2012). Semeval-2012 task 6: A pilot on semantic textual similarity. In Proceedings of the first joint conference on lexical and computational semantics-Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the sixth international workshop on semantic evaluation. Philadelphia: Association for Computational Linguistics.

  • Aliguliyev, R. M. (2009). A new sentence similarity measure and sentence based extractive technique for automatic text summarization. Expert Systems with Applications,36(4), 7764–7772.

    Article  Google Scholar 

  • Atkinson, J., Ferreira, A., & Aravena, E. (2009). Discovering implicit intention-level knowledge from natural-language texts. Knowledge-Based Systems,22(7), 502–508.

    Article  Google Scholar 

  • Bär, D., Biemann, C., Gurevych, I., & Zesch, T. (2012). Ukp: Computing semantic textual similarity by combining multiple content similarity measures. In: Proceedings of the first joint conference on lexical and computational semantics-Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the sixth international workshop on semantic evaluation. Philadelphia: Association for Computational Linguistics.

  • Barzilay, R., & McKeown, K. R. (2005). Sentence fusion for multidocument news summarization. Computational Linguistics,31(3), 297–328.

    Article  Google Scholar 

  • Best, C., van der Goot, E., Blackler, K., Garcia, T., & Horby, D. (2005). Europe media monitor. Technical Report EUR221 73 EN, European Commission.

  • Black, P. E. (2004). Ratcliff/Obershelp pattern recognition. In V. Pieterse & P. E. Black, (Eds.), Dictionary of algorithms and data structures (Vol. 17).

  • Blanco-Fernández, Y., Pazos-Arias, J. J., Gil-Solla, A., Ramos-Cabrer, M., López-Nores, M., García-Duque, J., et al. (2008). A flexible semantic inference methodology to reason about user preferences in knowledge-based recommender systems. Knowledge-Based Systems,21(4), 305–320.

    Article  Google Scholar 

  • Blumenthal, D. (2011). Implementation of the federal health information technology initiative. New England Journal of Medicine,365(25), 2426–2431.

    Article  Google Scholar 

  • Clough, P., & Stevenson, M. (2011). Developing a corpus of plagiarised short answers. Language Resources and Evaluation,45(1), 5–24.

    Article  Google Scholar 

  • Corley, C. (2007). A knowledge-based approach to text-to-text similarity CoUrTney Corley, Andras Csomai & Rada Mihalcea Dept. of Computer Science, University of North Texas. In Recent advances in natural language processing IV: Selected Papers from RANLP 2005 (Vol. 292, p. 197).

  • Corley, C., & Mihalcea, R. (2005). Measuring the semantic similarity of texts. In Proceedings of the ACL workshop on empirical modeling of semantic equivalence and entailment. Philadelphia: Association for Computational Linguistics.

  • Dzikovska, M. O., Moore, J. D., Steinhauser, N., Campbell, G., Farrow, E., & Callaway, C. B. (2010). Beetle II: A system for tutoring and computational linguistics experimentation. In Proceedings of the ACL 2010 system demonstrations. Philadelphia: Association for Computational Linguistics.

  • Embi, P. J., Weir, C., Efthimiadis, E. N., Thielke, S. M., Hedeen, A. N., & Hammond, K. W. (2013). Computerized provider documentation: Findings and implications of a multisite study of clinicians and administrators. Journal of the American Medical Informatics Association,20(4), 718–726.

    Article  Google Scholar 

  • Ferreira, R., Lins, R. D., Simske, S. J., Freitas, F., & Riss, M. (2016). Assessing sentence similarity through lexical, syntactic and semantic analysis. Computer Speech & Language,39, 1–28.

    Article  Google Scholar 

  • Friedman, C., & Elhadad, N. (2014). Natural language processing in health care and biomedicine. In E. H. Shortliffe & J. J. Cimino (Eds.), Biomedical informatics (pp. 255–284). London: Springer.

    Chapter  Google Scholar 

  • Guo, W., Li, H., Ji, H., & Diab, M. T. (2013). Linking tweets to news: A framework to enrich short text data in social media. In ACL (1), Citeseer.

  • Hirsch, J. S., Tanenbaum, J. S., Gorman, S. L., Liu, C., Schmitz, E., Hashorva, D., et al. (2015). HARVEST, a longitudinal patient record summarizer. Journal of the American Medical Informatics Association,22(2), 263–274.

    Google Scholar 

  • Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., & Weischedel, R. (2006). OntoNotes: The 90% solution. In Proceedings of the human language technology conference of the NAACL, Companion Volume: Short Papers. Philadelphia: Association for Computational Linguistics.

  • Islam, A., & Inkpen, D. (2008). Semantic text similarity using corpus-based word similarity and string similarity. ACM Transactions on Knowledge Discovery from Data (TKDD),2(2), 10.

    Article  Google Scholar 

  • Kauchak, D., & Barzilay, R. (2006). Paraphrasing for automatic evaluation. In Proceedings of the main conference on human language technology conference of the North American Chapter of the Association of Computational Linguistics. Philadelphia: Association for Computational Linguistics.

  • Kuhn, T., Basch, P., Barr, M., & Yackel, T. (2015). Clinical documentation in the 21st century: Executive summary of a policy position paper from the American College of Physicians Clinical Documentation in the 21st century. Annals of Internal Medicine,162(4), 301–303.

    Article  Google Scholar 

  • Li, D., Rastegar-Mojarad, M., Elayavilli, R. K., Wang, Y., Mehrabi, S., Yu, Y., et al. (2015). A frequency-filtering strategy of obtaining PHI-free sentences from clinical data repository. In Proceedings of the 6th ACM conference on bioinformatics, computational biology and health informatics. London: ACM.

  • Li, T., & Srikumar, V. (2016). Exploiting sentence similarities for better alignments. In Proceedings of EMNLP.

  • Li, Y., McLean, D., Bandar, Z. A., O’shea, J. D., & Crockett, K. (2006). Sentence similarity based on semantic nets and corpus statistics. IEEE Transactions on Knowledge and Data Engineering,18(8), 1138–1150.

    Article  Google Scholar 

  • Magnolini, S., Vo, N. P. A., & Popescu, O. (2016). Analysis of the impact of machine translation evaluation metrics for semantic textual similarity. In AI* IA 2016 advances in artificial intelligence (pp. 450–463). Berlin: Springer.

  • Majumder, G., Pakray, P., Gelbukh, A., & Pinto, D. (2016). Semantic textual similarity methods, tools, and applications: A survey. Computación y Sistemas,20(4), 647–665.

    Article  Google Scholar 

  • Meystre, S. M., Savova, G. K., Kipper-Schuler, K. C., & Hurdle, J. F. (2008). Extracting information from textual documents in the electronic health record: A review of recent research. Yearbook of Medical Informatics,35, 128–144.

    Google Scholar 

  • Miller, G. A. (1995). WordNet: a lexical database for English. Communications of the ACM,38(11), 39–41.

    Article  Google Scholar 

  • Mitkov, R. (2005). The Oxford handbook of computational linguistics. Oxford: Oxford University Press.

    Google Scholar 

  • Moon, S., Liu, S., Kingsbury, P., Chen, D., Wang, Y., Shen, F., et al. (2017). Medical concept intersection between outside medical records and consultant notes: A case study in transferred cardiovascular patients. In 2017 IEEE international conference on bioinformatics and biomedicine (BIBM) (pp. 1495–1500). Washington: IEEE.

  • Patwardhan, S., Banerjee, S., & Pedersen, T. (2003). Using measures of semantic relatedness for word sense disambiguation. In International conference on intelligent text processing and computational linguistics. Berlin: Springer.

  • Pearson, K. (1895). Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London,58, 240–242.

    Article  Google Scholar 

  • Pedersen, T., Patwardhan, S., & Michelizzi, J. (2004). WordNet:: Similarity—Measuring the relatedness of concepts. Demonstration papers at HLT-NAACL 2004. Philadelphia: Association for Computational Linguistics.

  • Pedersen, T., Pakhomov, S. V., Patwardhan, S., & Chute, C. G. (2007). Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics,40(3), 288–299.

    Article  Google Scholar 

  • Pivovarov, R., & Elhadad, N. (2015). Automated methods for the summarization of electronic health records. Journal of the American Medical Informatics Association,22(5), 938–947.

    Article  Google Scholar 

  • Pradhan, S., Elhadad, N., Chapman, W., Manandhar, S., & Savova, G. (2014). Semeval-2014 task 7: Analysis of clinical text. SemEval,199(99), 54.

    Google Scholar 

  • Rada, R., Mili, H., Bicknell, E., & Blettner, M. (1989). Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man, and Cybernetics,19(1), 17–30.

    Article  Google Scholar 

  • Raganato, A., Camacho-Collados, J., Raganato, A., & Joung, Y. (2016). Semantic indexing of multilingual corpora and its application on the history domain. In LT4DH 2016 (p. 140).

  • Rashtchian, C., Young, P., Hodosh, M., & Hockenmaier, J. (2010). Collecting image annotations using Amazon’s mechanical turk. In Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s Mechanical Turk. Philadelphia: Association for Computational Linguistics.

  • Rastegar-Mojarad, M., Liu, S., Wang, Y., Afzal, N., Wang, L., Shen, F., et al. (2018). BioCreative/OHNLP Challenge 2018. In ACM-BCB.

  • Salton, G., Wong, A., & Yang, C.-S. (1975). A vector space model for automatic indexing. Communications of the ACM,18(11), 613–620.

    Article  Google Scholar 

  • Šarić, F., Glavaš, G., Karan, M., Šnajder, J., & Bašić, B. D. (2012). Takelab: Systems for measuring semantic text similarity. In Proceedings of the first joint conference on lexical and computational semantics-Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the sixth international workshop on semantic evaluation. Philadelphia: Association for Computational Linguistics.

  • Savova, G. K., Masanz, J. J., Ogren, P. V., Zheng, J., Sohn, S., Kipper-Schuler, K. C., et al. (2010). Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): Architecture, component evaluation and applications. Journal of the American Medical Informatics Association,17(5), 507–513.

    Article  Google Scholar 

  • Schiff, G. D., & Bates, D. W. (2010). Can electronic clinical documentation help prevent diagnostic errors? New England Journal of Medicine,362(12), 1066–1069.

    Article  Google Scholar 

  • Singh, H., Giardina, T. D., Meyer, A. N., Forjuoh, S. N., Reis, M. D., & Thomas, E. J. (2013). Types and origins of diagnostic errors in primary care settings. JAMA Internal Medicine,173(6), 418–425.

    Article  Google Scholar 

  • Srihari, R. K., Zhang, Z., & Rao, A. (2000). Intelligent indexing and semantic retrieval of multimodal documents. Information Retrieval,2(2–3), 245–275.

    Article  Google Scholar 

  • Tapeh, A. G., & Rahgozar, M. (2008). A knowledge-based question answering system for B2C eCommerce. Knowledge-Based Systems,21(8), 946–950.

    Article  Google Scholar 

  • Wang, M. D., Khanna, R., & Najafi, N. (2017a). Characterizing the source of text in electronic health record progress notes. JAMA Internal Medicine,177(8), 1212–1213.

    Article  Google Scholar 

  • Wang, Y., Liu, S., Afzal, N., Rastegar-Mojarad, M., Wang, L., Shen, F., et al. (2018a). A comparison of word embeddings for the biomedical natural language processing. arXiv preprint arXiv:1802.00400.

  • Wang, Y., Rastegar-Mojarad, M., Komandur-Elayavilli, R., & Liu, H. (2017). Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts. Database.

  • Wang, Y., Wang, L., Rastegar-Mojarad, M., Moon, S., Shen, F., Afzal, N., et al. (2018b). Clinical information extraction applications: A literature review. Journal of Biomedical Informatics,77, 34–49.

    Article  Google Scholar 

  • Wieting, J., & Gimpel, K. (2017). Revisiting recurrent networks for paraphrastic sentence embeddings. arXiv preprint arXiv:1705.00364.

  • Williams, C., Mostashari, F., Mertz, K., Hogin, E., & Atwal, P. (2012). From the Office of the National Coordinator: The strategy for advancing the exchange of health information. Health Aff (Millwood),31(3), 527–536.

    Article  Google Scholar 

  • Wrenn, J. O., Stein, D. M., Bakken, S., & Stetson, P. D. (2010). Quantifying clinical narrative redundancy in an electronic health record. Journal of the American Medical Informatics Association,17(1), 49–53.

    Article  Google Scholar 

  • Wu, S. T., Liu, H., Li, D., Tao, C., Musen, M. A., Chute, C. G., et al. (2012). Unified Medical Language System term occurrences in clinical notes: A large-scale corpus analysis. Journal of the American Medical Informatics Association,19(e1), e149–e156.

    Article  Google Scholar 

  • Yan, Y., Yin, X.-C., Li, S., Yang, M., & Hao, H.-W. (2015). Learning document semantic representation with hybrid deep belief network. Computational Intelligence and Neuroscience,2015, 28.

    Article  Google Scholar 

  • Zhang, R., Pakhomov, S., McInnes, B. T., & Melton, G. B. (2011). Evaluating measures of redundancy in clinical texts. In AMIA annual symposium proceedings. Bethesda: American Medical Informatics Association.

  • Zhang, R., Pakhomov, S. V., Lee, J. T., & Melton, G. B. (2014). Using language models to identify relevant new information in inpatient clinical notes. In AMIA annual symposium proceedings. Bethesda: American Medical Informatics Association.

Download references

Acknowledgements

This work was made possible by the National Institute of Health (NIH) grants R01LM011934, R01GM102282, R01EB19403, R01LM11829 and U01TR002062.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongfang Liu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Afzal, N., Fu, S. et al. MedSTS: a resource for clinical semantic textual similarity. Lang Resources & Evaluation 54, 57–72 (2020). https://doi.org/10.1007/s10579-018-9431-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-018-9431-1

Keywords

Navigation