aiDIVA - Diagnostics of Rare Genetic Diseases Using Large Language Models

Abstract

Genome sequencing (GS) enables the accurate identification of genetic variants in most genomic regions and is rapidly transforming routine diagnostics for rare diseases (RD). While streamlined data generation is scalable, efficient prioritization and correct clinical interpretation of detected alterations remain a challenge, often requiring manual classification by experts with years of training. Hence, there is a need for AI-driven clinical decision support systems that assist clinical experts in identifying causal variants or, in case of large-scale re-analysis of unsolved cases, fully automate the process. To this end, many tools have been developed to estimate the impact of variants on protein function. However, only a small number of tools combine genomic data, variant annotations, and phenotypic data to diagnose cases.

Here we introduce aiDIVA, an ensemble-AI featuring a hierarchically organized set of statistical and machine learning models trained on genomic and phenotypic data to identify the causal variant(s) among tens of thousands of genetic variants of a patient. aiDIVA generates pathogenicity classifications for each variant using a random forest AI model and an evidence-based score for dominant and recessive diseases. It combines these predictions with additional clinical metadata to prioritize and rank the most likely causal variants. aiDIVA uses large language models (LLMs) to further improve and explain the results. Finally, the aiDIVA-meta model combines all scores to generate a ranked list of variants. In a benchmark analysis on more than 3,000 diagnostically solved RD patients, the causal variant was included in 97% of the cases among the top-3 candidate variants reported by aiDIVA-meta. Unlike comparative methods, aiDIVA provides interpretable explanations for the best candidates.

Competing Interest Statement

Daniela Bezdan, Stephan Ossowski, Marc Sturm and Tobias Haack are cofounders and shareholders of dxOmics GmbH. The other authors declare no competing interests.

Funding Statement

T.B.H received funding from the European Commission (Recon4IMD - GAP-101080997) and the German Research Foundation (DFG; research grant numbers 418081722, 433158657, EJP-RD Artemis: 542553983). S.O. and D.Be. were supported by the European Union via the project European Rare Disease Research Alliance (ERDERA), GA n°101156595, funded under call HORIZON-HLTH-2023-DISEASE-07 and the European Union's Horizon 2020 research and innovation programme under grant agreement No 779257 (SolveRD). S.O. received funding from the German Research Foundation (DFG; research grant numbers 514060894, 514208594, 514177729, and OS 647/1-1).

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Written informed consent was obtained from all individuals or their guardians and archived. All procedures were performed in accordance with the Helsinki Declaration. Individual-level data were de-identified. The study was approved by the ethics committee of the medical faculty by the local Institutional Review Board of the Medical Faculty of the University of Tübingen, Germany (066/2021BO2).

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Comments (0)

No login
gif