Data Transformation to Advance AI/ML Research and Implementation in Primary Care [Special Report]

Abstract

Artificial intelligence and machine learning (AI/ML) in health care is accelerating at a breathtaking pace. As the largest health care delivery platform, primary care is where the power, opportunity, and future of AI/ML are most likely to be realized in the broadest and most ambitious scale. However, there is a relative lack of organized, open, large-scale primary care datasets to attract industry and academia in primary care–focused research and development. This article proposes a set of high-level considerations around the data transformation that is needed to enable the growth of AI/ML applications in primary care. These considerations call for automation of data collection, organization of fragmented data, identification of primary care–specific use cases, integration of AI/ML into human workflows, and surveillance for unintended consequences. By unlocking the power of its data, primary care can play a leading role in advancing health care AI/ML to support patients, clinicians, and the health of the nation.

INTRODUCTION

The health care industry generates a massive amount of data daily, from unstructured clinical notes to structured test reports. Nearly 30% of the world’s data are being generated by the health care industry.1 Data has fueled rapid advancements in artificial intelligence (AI) and machine learning (ML) applications in health care, especially in disciplines such as radiology, cardiology, and neurology. When comparing primary care with these specialties, a relative lack of organized, open, large-scale primary care datasets to attract the attention of industry and academia is apparent, which hinders primary care–focused AI/ML research and development. The complexity of primary care, driven by fragmented systems, diverse patient needs, and significant administrative burdens creates additional challenges. These difficulties in accessing standardized, high-quality data and the complexity of primary care establish the need for AI in primary care.2 Building on principles outlined in the National Academies of Sciences, Engineering, and Medicine’s (NAM) “Implementing High-Quality Primary Care” consensus study report,3 this paper proposes 5 key considerations for data transformation in primary care: automation of data collection, organization of fragmented data, identification of primary care–specific use cases, integration of AI/ML into human workflows, and surveillance for unintended consequences. These act as gears in a machine that will need to operate together to be able to transform the immense amount of untapped primary care data into a key resource for AI/ML tools to be leveraged in primary care (Figure 1).

Figure 1.Figure 1.Figure 1.

Five Considerations and 3 Catalysts for Data Transformation to Advance AI/ML Research and Implementation in Primary Care

AI = artificial intelligence; ML = machine learning.

In 2021, health care AI/ML companies raised $12.5 billion, a sum that is 25 times higher than what it was in 2015.4,5 Subsequently in 2025, companies have announced an endeavor to invest over $100 billion in creating computing infrastructure in the United States to support the development and deployment of AI.6 Despite this massive investment, relatively little of the 2021 investment was dedicated to primary care. For example, even though primary care accounts for 52% of all outpatient care delivered in the United States, only 3% of all AI/ML medical devices and algorithms approved by the FDA are intended for primary care; whereas radiology accounts for 49%, cardiology for 20%, and neurology for 8%.7 More than three-quarters of people in the United States have existing health care relationships with their primary care clinician. With more than 500 million primary care visits per year in the United States—outnumbering all other medical specialties visits combined—primary care stands as the largest health care delivery platform and the largest source of health care data.8,9

Looking internationally, the United Kingdom has several datasets, such as the Clinical Practice Research Datalink and the Optimum Patient Care Research Database, that house substantial primary care information; by contrast, the United States lacks similarly organized resources that include primary care data.10,11As abundant as this data may be in the United States, there are significant limitations in the methods of data collection, organization, usability, and accessibility that prevent their widespread application in primary care.

Considerations for Data Transformation in Primary Care

The aforementioned 2021 NAM report provided the blueprint for implementing a high-performing primary care foundation to the US health care system, as well as the critical functions of digital health to optimally support primary care: collect, aggregate, analyze, and apply information to decision making and clinical care.3 As we consider the transformation of primary care data, 5 considerations emerge to fully realize AI/ML in primary care.

The first consideration is to automate data collection systems. Achieving this requires truly interoperable electronic health records (EHRs) and health devices such as smart devices, wearables, and AI/ML tools to extract key elements from digital communications from patient portals. An early example of this is the use of barcodes and the shift to electronic medication administration records, which significantly reduced error rates in medication transcription and potential adverse drug events.12 Primary care clinic patients currently complete surveys and screening questionnaires before appointments through patient portals that integrate with provider systems to view during visits. Additionally, medical device companies are developing ways to directly send data from remote patient monitoring devices, such as continuous glucose monitors, to EHRs to facilitate clinical decision making. Automating the data collection process not only reduces the data entry and management burden for health care providers, but also lays the foundation for a more robust and cleaner dataset for researchers, creating capacity to build AI/ML clinical decision support tools.

The second consideration is to organize disparate sources of data that will ensure interoperability and access. The quality and accuracy of these models are only as good as the data on which they were trained. This necessitates a focus on the quality of the data and agreed-upon standards such as Health Level 7 Fast Healthcare Interoperability Resources (HL7 FHIR).13 To fully leverage the abundance of data that is collected, clear methods and standards must be in place for researchers and industry to use the data safely and sensitively. Federated learning and application programming interfaces (APIs) are 2 such strategies of achieving this. Federated learning is when a model is trained on separate servers within the confines of each institution and is later aggregated to create a more comprehensive model reflecting all the data sources.14 This enables organizations to collaborate on AI/ML projects without sharing sensitive data such as patient records. One such example is a federated model that was developed to predict hospitalizations in patients with cardiac diseases.15 The benefit of federated learning is that data does not need to be stored at a central location, which enables a model to be trained on disparate datasets while maintaining data privacy standards. A significant feature of the model is a distributed learning framework, where data stays with each institution and no raw data are exchanged. APIs, on the other hand, are instructions that allow for different applications to interact while ensuring that only the specific data requested is accessed.16 For example, the Apple Health application can securely connect to an API that provides access to a user’s medical data from directly on their phone.17 This is a solution that is available to all of the nearly 150 million iPhone users in the United States.18 Both federated learning and APIs are examples of how key components of a data-sharing infrastructure can significantly impact AI/ML model development and the access of data for patients, clinicians, researchers, and industry. The potential for AI in primary care extends even further into application areas such as population health management, medical advice and triage, diagnostics, chart review and documentation.19 These areas will directly affect primary care physicians in how they interact with their patients and how care is delivered.

The third consideration is to identify primary care–specific use cases to apply the data. While AI/ML in non–primary care specialties tend to have an outsized emphasis on diagnostics, primary care tends to be more focused on the delivery and quality aspects of care, applied not only at the individual patient-clinician level but also at the population level. For example, one important primary care–specific use case is the prediction and prevention of avoidable emergency department visits and hospitalizations, which costs the United States more than $100 billion each year.20 AI/ML-powered platforms designed to identify high- and rising-risk patients and predict their likelihood of requiring emergency department visit or hospital admission can empower primary care teams to intervene preemptively, enabling movement toward value-based care. The administrative burden of clinical documentation is already being augmented by AI with the use of an AI-based scribe that drafts a note from a recorded conversation between the clinician and the patient.21 There are opportunities to leverage these technologies to make an impact in value-based care and care management, as well as augmenting the patient-physician relationship (Table 1).22

Table 1.

Summary of Potential Use Cases in Primary Care

The fourth consideration is integration of tools into human workflows. Health care leaders have frequently identified integration of AI/ML tools as the greatest barrier to their successful use in a complex delivery system.23,24 Without a thoughtful and sustainable implementation, these tools will not be leveraged or optimized to their fullest potential. Methodologies based on tried-and-true quality improvement frameworks have been proposed to enable the integration of AI/ML25 into complex clinical environments. Answering these translational questions are best done through implementation science. Researchers at the United States Veterans Affairs health systems through the Quality Enhancement Research Initiative (QUERI) have published an implementation road map that can serve as guiding principles to scale and implement AI/ML.26 Health systems have the opportunity to develop or improve workflows with AI/ML tools that cannot only impact the quality of care delivered, but also the experience of delivering that care for the care team.

Lastly, the fifth consideration is critical analysis and monitoring of AI/ML models to identify and address potential unintended consequences. An example is algorithmic bias due to lack of data representativeness in the datasets used to build the models, which leads to model inaccuracy for certain populations and may cause harm in subgroups due to inaccurate predictions.27 One study that independently examined an algorithm used to predict health risks found that it underestimated the health needs of Black patients by using health care costs as a proxy for illness severity, leading to fewer Black patients being identified for high-risk care management programs.28 Health inequities permeate the design and use of AI/ML models, from the data that the models are trained on to the unequal access to these tools and resources.21 Organizations such as NAM have begun to develop their own code of conduct in regards to AI/ML and consider these types of unintended consequences and more.29 The Coalition for Health AI (CHAI) has begun to establish their own guide for the responsible use of AI in health care and provides a toolkit for organizations to consider when deploying these models.30 NAM and CHAI are laying the foundation for the responsible use of data, guiding the ongoing surveillance of AI/ML models to mitigate unintended consequences and potential health inequities in primary care.

How Do We Bring This Together?

There are 3 factors that will enable each of these considerations to be effective and work cohesively, acting as lubrication for each of the individual gears. First, increased collaboration of the AI/ML research community—both industry and academia—with primary care; second, increased funding from the private and public sectors for research, development, and implementation for and by primary care; and finally, significant infrastructure upgrades, both human and data, to support these efforts.

Primary care data transformation will require cross-sectoral collaboration between government, industry, professional organizations, academia, and frontline primary care. Professional organizations can support frontline primary care providers by advocating at the federal level and simultaneously partnering with academic institutions and industry. We have seen the American Academy of Family Physicians meet with leaders from the Centers for Medicare and Medicaid Services (CMS) and members of Congress as well as the formation of a Congressional Primary Care Caucus31 while simultaneously establishing an innovation laboratory in order to evaluate new technologies in service of primary care physicians.32 The American Board of Family Medicine committed more than $4 million to building AI/ML research capacity in family medicine and partnered with the Gordon and Betty Moore Foundation to develop AI/ML curricula, develop grants, and create a bootcamp for AI/ML research.33

Federal agencies will need to have governance policies, streamlined approval processes, and enhanced public sector funding. We have already started to see the foundation of this with the blueprint for an AI Bill of Rights from the Office of Science and Technology Policy1; however, it does not specifically address challenges related to primary care. The National Institute of Health has developed the Artificial Intelligence/Machine Learning Consortium to Advance Health Equity and Researcher Diversity as an initiative to focus on advancement of health equity by addressing bias in AI models, fostering diversity in research, and promoting inclusive data practices.34

Industry will provide the economic engine for the effort, from data and computing resources to an expert AI/ML workforce. Academia will need to provide thought leadership, education, and real-world clinical laboratories to enable the basic and translational research required to move the field forward. These types of cross-sectoral collaborations are key to realizing the transformation of primary care data into a treasured resource that can unlock the true potential of AI/ML in primary care.

Comments (0)

No login
gif