Operationalizing Large Language Models for Clinical Research Data Extraction: Methods, Quality Control, and Governance

Yoon D, Han C, Kim DW, Kim S, Bae S, Ryu JA, et al. Redefining Health Care Data Interoperability: Empirical Exploration of Large Language Models in Information Exchange. J Med Internet Res 2024;26:e56614. https://doi.org/10.2196/56614.

Article PubMed PubMed Central Google Scholar

Roberts A. The Use of Natural Language Processing to Transform Health Records Information. Eur Psychiatry 2015;30:148. https://doi.org/10.1016/S0924-9338(15)30124-3.

Article Google Scholar

Cascella M, Semeraro F, Montomoli J, Bellini V, Piazza O, Bignami E. The Breakthrough of Large Language Models Release for Medical Applications: 1-Year Timeline and Perspectives. J Med Syst 2024;48:22. https://doi.org/10.1007/s10916-024-02045-3.

Article PubMed PubMed Central Google Scholar

Aronson AR. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA Symp 2001:17–21.

Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc JAMIA 2010;17:507–13. https://doi.org/10.1136/jamia.2009.001560.

Article PubMed PubMed Central Google Scholar

Xu H, Stenner SP, Doan S, Johnson KB, Waitman LR, Denny JC. MedEx: a medication information extraction system for clinical narratives. J Am Med Inform Assoc JAMIA 2010;17:19–24. https://doi.org/10.1197/jamia.M3378.

Article CAS PubMed PubMed Central Google Scholar

Sohn S, Clark C, Halgrim SR, Murphy SP, Chute CG, Liu H. MedXN: an open source medication extraction and normalization tool for clinical text. J Am Med Inform Assoc JAMIA 2014;21:858–65. https://doi.org/10.1136/amiajnl-2013-002190.

Article PubMed PubMed Central Google Scholar

Nadeau D, Sekine S. A survey of named entity recognition and classification. Lingvisticæ Investig 2007;30:3–26. https://doi.org/10.1075/li.30.1.03nad.

Article Google Scholar

Lafferty JD, McCallum A, Pereira FCN. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. Proc. Eighteenth Int. Conf. Mach. Learn., San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.; 2001, p. 282–9.

Sutton C, McCallum A. An Introduction to Conditional Random Fields 2010. https://doi.org/10.48550/arXiv.1011.4088.

Uzuner Ö, South BR, Shen S, DuVall SL. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J Am Med Inform Assoc JAMIA 2011;18:552–6. https://doi.org/10.1136/amiajnl-2011-000203.

Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C. Neural Architectures for Named Entity Recognition. In: Knight K, Nenkova A, Rambow O, editors. Proc. 2016 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol., San Diego, California: Association for Computational Linguistics; 2016, p. 260–70. https://doi.org/10.18653/v1/N16-1030.

Ma X, Hovy E. End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF. In: Erk K, Smith NA, editors. Proc. 54th Annu. Meet. Assoc. Comput. Linguist. Vol. 1 Long Pap., Berlin, Germany: Association for Computational Linguistics; 2016, p. 1064–74. https://doi.org/10.18653/v1/P16-1101.

Devlin J, Chang M-W, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Burstein J, Doran C, Solorio T, editors. Proc. 2019 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. Vol. 1 Long Short Pap., Minneapolis, Minnesota: Association for Computational Linguistics; 2019, p. 4171–86. https://doi.org/10.18653/v1/N19-1423.

Gururangan S, Marasović A, Swayamdipta S, Lo K, Beltagy I, Downey D, et al. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks. In: Jurafsky D, Chai J, Schluter N, Tetreault J, editors. Proc. 58th Annu. Meet. Assoc. Comput. Linguist., Online: Association for Computational Linguistics; 2020, p. 8342–60. https://doi.org/10.18653/v1/2020.acl-main.740.

Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinforma Oxf Engl 2020;36:1234–40. https://doi.org/10.1093/bioinformatics/btz682.

Article CAS Google Scholar

Alsentzer E, Murphy J, Boag W, Weng W-H, Jindi D, Naumann T, et al. Publicly Available Clinical BERT Embeddings. In: Rumshisky A, Roberts K, Bethard S, Naumann T, editors. Proc. 2nd Clin. Nat. Lang. Process. Workshop, Minneapolis, Minnesota, USA: Association for Computational Linguistics; 2019, p. 72–8. https://doi.org/10.18653/v1/W19-1909.

Kaplan J, McCandlish S, Henighan T, Brown TB, Chess B, Child R, et al. Scaling Laws for Neural Language Models 2020. https://doi.org/10.48550/arXiv.2001.08361.

Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput 1997;9:1735–80. https://doi.org/10.1162/neco.1997.9.8.1735.

Article CAS PubMed Google Scholar

Contreras Kallens P, Kristensen-McLachlan RD, Christiansen MH. Large Language Models Demonstrate the Potential of Statistical Learning in Language. Cogn Sci 2023;47:e13256. https://doi.org/10.1111/cogs.13256.

Article PubMed Google Scholar

Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language Models are Few-Shot Learners. Adv. Neural Inf. Process. Syst., vol. 33, Curran Associates, Inc.; 2020, p. 1877–901.

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is All you Need. Adv. Neural Inf. Process. Syst., vol. 30, Curran Associates, Inc.; 2017.

Dagdelen J, Dunn A, Lee S, Walker N, Rosen AS, Ceder G, et al. Structured information extraction from scientific text with large language models. Nat Commun 2024;15:1418. https://doi.org/10.1038/s41467-024-45563-x.

Article CAS PubMed PubMed Central Google Scholar

Lewis P, Perez E, Piktus A, Petroni F, Karpukhin V, Goyal N, et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks 2021. https://doi.org/10.48550/arXiv.2005.11401.

Kim S, Lee C-K, Kim S-S. Large Language Models: A Guide for Radiologists. Korean J Radiol 2024;25:126–33. https://doi.org/10.3348/kjr.2023.0997.

Article PubMed PubMed Central Google Scholar

Kernan Freire S, Wang C, Foosherian M, Wellsandt S, Ruiz-Arenas S, Niforatos E. Knowledge sharing in manufacturing using LLM-powered tools: user study and model benchmarking. Front Artif Intell 2024;7:1293084. https://doi.org/10.3389/frai.2024.1293084.

Article PubMed PubMed Central Google Scholar

Schick T, Dwivedi-Yu J, Dessì R, Raileanu R, Lomeli M, Zettlemoyer L, et al. Toolformer: Language Models Can Teach Themselves to Use Tools 2023. https://doi.org/10.48550/arXiv.2302.04761.

Wei J, Wang X, Schuurmans D, Bosma M, Ichter B, Xia F, et al. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models 2023. https://doi.org/10.48550/arXiv.2201.11903.

Hu EJ, Shen Y, Wallis P, Allen-Zhu Z, Li Y, Wang S, et al. LoRA: Low-Rank Adaptation of Large Language Models 2021. https://doi.org/10.48550/arXiv.2106.09685.

Zhou Y, Muresanu AI, Han Z, Paster K, Pitis S, Chan H, et al. Large Language Models Are Human-Level Prompt Engineers 2023. https://doi.org/10.48550/arXiv.2211.01910.

Khattab O, Singhvi A, Maheshwari P, Zhang Z, Santhanam K, Vardhamanan S, et al. DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines 2023. https://doi.org/10.48550/arXiv.2310.03714.

Holtzman A, Buys J, Du L, Forbes M, Choi Y. The Curious Case of Neural Text Degeneration, 2020.

Lu X, West P, Zellers R, Le Bras R, Bhagavatula C, Choi Y. NeuroLogic Decoding: (Un)supervised Neural Text Generation with Predicate Logic Constraints. In: Toutanova K, Rumshisky A, Zettlemoyer L, Hakkani-Tur D, Beltagy I, Bethard S, et al., editors. Proc. 2021 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol., Online: Association for Computational Linguistics; 2021, p. 4288–99. https://doi.org/10.18653/v1/2021.naacl-main.339.

Liang P, Bommasani R, Lee T, Tsipras D, Soylu D, Yasunaga M, et al. Holistic Evaluation of Language Models 2023. https://doi.org/10.48550/arXiv.2211.09110.

Article Google Scholar

Rajkomar A, Hardt M, Howell MD, Corrado G, Chin MH. Ensuring Fairness in Machine Learning to Advance Health Equity. Ann Intern Med 2018;169:866–72. https://doi.org/10.7326/M18-1990.

Article PubMed PubMed Central Google Scholar

Chinchor N. Appendix B: MUC-7 Test Scores Introduction. Seventh Message Underst. Conf. MUC-7 Proc. Conf. Held Fairfax Va. April 29 - May 1 1998, 1998.

Makhoul J, Kubala F, Schwartz R, Weischedel R. Performance measures for information extraction. Proc. DARPA Broadcast News Workshop, vol. 249, Herndon, VA; 1999, p. 252.

Sung M, Jeong M, Choi Y, Kim D, Lee J, Kang J. BERN2: an advanced neural biomedical named entity recognition and normalization tool. Bioinformatics 2022;38:4837–9. https://doi.org/10.1093/bioinformatics/btac598.

Article CAS PubMed PubMed Central Google Scholar

Workman TE, Ahmed A, Sheriff HM, Raman VK, Zhang S, Shao Y, et al. ChatGPT-4 extraction of heart failure symptoms and signs from electronic health records. Prog Cardiovasc Dis 2024;87:44–9. https://doi.org/10.1016/j.pcad.2024.10.010.

Article PubMed PubMed Central Google Scholar

Gérardin C, Wajsbürt P, Vaillant P, Bellamine A, Carrat F, Tannier X. Multilabel classification of medical concepts for patient clinical profile identification. Artif Intell Med 2022;128:102311. https://doi.org/10.1016/j.artmed.2022.102311.

Article PubMed Google Scholar

Kelly L, Goeuriot L, Suominen H, Schreck T, Leroy G, Mowery DL, et al. Overview of the ShARe/CLEF eHealth Evaluation Lab 2014. In: Kanoulas E, Lupu M, Clough P, Sanderson M, Hall M, Hanbury A, et al., editors. Inf. Access Eval. Multilinguality Multimodality Interact., Cham: Springer International Publishing; 2014, p. 172–91. https://doi.org/10.1007/978-3-319-11382-1_17.

Chapter Google Scholar

Hendrickx I, Kim SN, Kozareva Z, Nakov P, Ó Séaghdha D, Padó S, et al. SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations between Pairs of Nominals. In: Erk K, Strapparava C, editors. Proc. 5th Int. Workshop Semantic Eval., Uppsala, Sweden: Association for Computational Linguistics; 2010, p. 33–8.

Bai Y, Cui W, Finkelstein J. Performance of Open-Source Large Language Models to Extract Symptoms from Clinical Notes. Stud Health Technol Inform 2025;329:663–7. https://doi.org/10.3233/SHTI250923.

Article PubMed Google Scholar

Asai A, Wu Z, Wang Y, Sil A, Hajishirzi H. Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection 2023. https://doi.org/10.48550/arXiv.2310.11511.

Gao L, Dai Z, Pasupat P, Chen A, Chaganty AT, Fan Y, et al. RARR: Researching and Revising What Language Models Say, Using Language Models 2023. https://doi.org/10.48550/arXiv.2210.08726.

Rastogi A, Zang X, Sunkara S, Gupta R, Khaitan P. Towards Scalable Multi-Domain Conversational Agents: The Schema-Guided Dialogue Dataset. Proc AAAI Conf Artif Intell 2020;34:8689–96. https://doi.org/10.1609/aaai.v34i05.6394.

Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap)--a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform 2009;42:377–81. https://doi.org/10.1016/j.jbi.2008.08.010.

Article PubMed Google Scholar

Lee S, Lee H-H, Lee H, Yum KS, Baek J-H, Khil J, et al. Confidence-linked and uncertainty-based staged framework for phenotype validation using large language models. J Am Med Inform Assoc JAMIA 2025;32:1320–7. https://doi.org/10.1093/jamia/ocaf099.

Article PubMed PubMed Central Google Scholar

Geifman Y, El-Yaniv R. Selective Classification for Deep Neural Networks 2017. https://doi.org/10.48550/arXiv.1705.08500.

Settles B. Active Learning Literature Survey. University of Wisconsin-Madison Department of Computer Sciences; 2009.

Guo C, Pleiss G, Sun Y, Weinberger KQ. On calibration of modern neural networks. Int. Conf. Mach. Learn., PMLR 2017;70:1321–30.

Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement. BMC Med 2015;13:1. https://doi.org/10.1186/s12916-014-0241-z.

Article

View original article

JOURNAL OF MEDICAL SYSTEMS

Like

Share Bookmark

0 0 0 0 0 0 0

More from this channel

Operationalizing Large Language Models for Clinical Research Data Extraction: Methods, Quality Control, and Governance

Comments (0)