In this work, we developed ML algorithms to estimate the inter-dose AUCss of GCV in adults’ SOT prophylaxis. Our algorithms were trained on Monte Carlo simulations based on two models, using only two or three plasma concentrations and CrCL as predictive variables. As observed in some of our previous work, the XGBoost algorithm outperformed linear regression, linear SVM, MARS and random forest in predicting drug exposure (35,36,37). Interestingly, SVM also demonstrated very good performance, with differences that were not clinically significant. The performance in the training and testing sets was excellent, with bias close to zero and imprecision less than 8.50%.
For external validation, we generated new patients using two other popPK models. The results indicated a higher bias and greater imprecision for patients from the Chen model compared to those from the Caldés model. We observed that the ML algorithms systematically overestimated AUCss in simulations from the Chen model. However, this decreased performance was anticipated, as the models by Vezina, Lalagkas, and Caldés were developed on European patients with kidney, liver, lung, or heart transplants, while the Chen model was developed from a Chinese population of solely kidney transplant patients. However, the MAP-BE approach using the Vezina model sometimes performed better in the Chen population than in the Caldés population, both of which are European-based models. This suggests that factors beyond population origin, such as structural model parameters, may also play a role. Specifically, we noted that while all models were based on two-compartment structures with Ka and lag time parameters, the clearance values in the Vezina and Chen models were similar and approximately twice as high as those in the Caldés model. This difference in clearance values may partially explain the observed differences in model performance, indicating that structural model characteristics are critical to consider when interpreting cross-population predictions.
The utility of developing an ML algorithm when popPK models are available for MAP-BE may be questioned. To answer that, we compared the performance of our ML algorithms with MAP-BE from the two popPK models used to train the ML algorithms, using an external dataset generated by simulation from two external popPK models. We found that the XGBoost algorithm performed better than MAP-BE based on the Lalagkas model. Although the MAP-BE performance of the Vezina model was close to that of the XGBoost algorithms for validation from the Chen population, it was poor for the Caldés population. This suggests greater generalizability and robustness of the estimates provided by the ML algorithms compared to those from MAP-BE. This superiority could be attributed to XGBoost's ability to handle complex, nonlinear relationships within the data, even though it was trained on simulated data. It also highlights that the residual error estimated using nonlinear mixed effects models’ approach is probably overestimated.
We also evaluated our algorithms on real renal transplant patients. Algorithms using 2 or 3 samples between C0 and C12 performed excellently, while those restricted to 3 samples between C0 and C6 were less effective, showing a systematic underestimation of the inter-dose AUCss. This can be attributed to the fact that the reference inter-dose AUCss are determined using the trapezoidal method, which lacks real data points between C5 and C24 or C48 and does not capture the bi-exponential decay described by the popPK models. Consequently, extrapolated concentrations at 12 h after the VGCV dose and inter-dose AUCss might be overestimated, and this overestimation might not be captured by algorithms using only three samples between C0 and C6. When comparing these predictions to those made by MAP-BE, XGBoost demonstrated superior performance in estimating AUCss. These findings emphasize the importance of considering all three evaluation criteria: rMPE, rRMSE, and the number of MPEs outside the ± 20% range. A low rMPE combined with a high rRMSE indicates that, while the average error remains small (meaning predictions do not systematically deviate in one direction), individual errors exhibit high variability.
The XGBoost algorithm is not a simple formula. To enhance its usability, we developed a Shiny interface for real-time calculation of AUC0-24ss (38). The application, available at https://sayadi-h.shinyapps.io/AUC_GCV/, is currently for demonstration purposes. To use it, first select the appropriate GCV dosage from the four options: 900 mg/24 h, 450 mg/24 h, 450 mg/48 h, or 450 mg/72 h. Next, choose the model based on the number of possible blood samples. Finally, simply enter the eGFR and GCV plasma concentrations, and you're all set to obtain the results.
Interpretability remains a critical factor in the clinical acceptance of machine learning-based model-informed precision dosing (MIPD) tools. To enhance transparency, we generated variable importance plots (VIP) (Fig. 2), which explain the primary drivers of AUC predictions at the population level. Additionally, a future release of our algorithm will incorporate breakdown plots, providing deeper insights at the individual level and further enhancing the explainability of predictions.
Beyond interpretability, many MIPD tools struggle with clinical adoption due to a lack of user-centered design. In contrast, our approach prioritizes usability by validating the model through both external simulations and real patient data. The Shiny app was specifically developed to improve accessibility and ease of use for clinicians, addressing common barriers such as limited availability.
Therapeutic drug monitoring (TDM) of GCV for SOT is not currently widespread. Nonetheless, as noted earlier, GCV demonstrates interindividual PK variability, especially among patients with reduced GFR. Moreover, a multicenter randomized study demonstrated that the increased exposure achieved with VGCV compared to oral GCV was associated with a lower incidence of CMV viremia (VGCV: 2.9% vs GCV: 10.4%, P = 0,001) and a reduction in the incidence of CMV viremia breakthroughs by day 100 post-SOT (incidence of 1.3% with an AUC0–24 h of 50 mg*h/L, while an AUC0–24 h of 25 mg*h/L increases the risk of viremia by 8 times) (12). Building on these results, another prospective, randomized, controlled study compared two groups: group A followed the manufacturer’s dosing recommendations, while group B had their dosages adjusted based on target exposures. Group B showed faster viral clearance (12.5 vs. 17.6 days, P = 0.125), as well as significantly lower relapse rates (9.01% vs. 66.67%) and late infection rates (7.7% vs. 36.7%) compared to group A (13). Our ML algorithms could help to promote MIPD of VGCV, especially by using the one based on 2 sampled rending easier its use in routine practice.
This study has some limitations. One key limitation is that while Wiltshire et al. is often cited as the primary source defining AUC targets for GCV, other studies, such as Padulles et al., have also demonstrated the relevance of these targets. However, the literature on GCV AUC targets remains limited. By simplifying GCV exposure estimation, our exposure prediction models could help fill this gap and facilitate future research on exposure-effect relationships.
While our ML models have shown promise in estimating AUC, they fundamentally differ from MAP-BE by directly estimating AUC without modeling the time-dependent PK dynamics. This difference represents a limitation, as ML models do not explicitly incorporate the evolving drug concentration over time, which limits their generalizability across varying dosing regimens or modified PK profiles.
Additionally, our findings on deviations from theoretical sampling times highlight a limitation of ML models, which are sensitive to precise sampling times. While these models have demonstrated promise in predicting drug exposure, our results suggest that their performance may be less robust when sampling times deviate from the theoretical. This issue is particularly evident in models like the Chen et al. model, where deviations resulted in a significant increase in error metrics. Although MAP-BE is generally more flexible regarding sampling time deviations, a recent study indicates that it, too, can be affected by such deviations (39). Future research should consider incorporating sampling time adjustments or exploring hybrid approaches that combine mechanistic modeling with ML techniques to enhance robustness.
Moreover, the validation of our algorithms was partially conducted on simulated patients. Our simulated AUCss values are heavily dependent on the two popPK models used and the characteristics of the patient populations on which they were based. These models may not fully capture the variability and heterogeneity present in broader and more diverse patient populations, a fundamental limitation inherent to in silico approaches. Additionally, the renal transplant recipients had five blood samples and not a full PK profile, and no real patients with a dosing regimen of 450 mg every three days were investigated. Caution is warranted when applying the algorithm, particularly if their characteristics differ from those of the populations used to develop the popPK models. Future research should aim to include diverse patient populations with comprehensive PK profiles in external validation studies to improve the generalizability of our findings.
It is also essential to note that the ML algorithms were trained on two popPK models that eGFR differently: the Cockcroft-Gault formula for the Vezina model and the CKD-EPI formula for the Lalagkas model. The CKD-EPI formula includes race in its calculations of eGFR, which contradicts the current recommendations of European and American nephrology organizations advocating for race-neutral equations. This inclusion could potentially perpetuate health inequalities, particularly when developing AI-based models that might inadvertently incorporate such inductive bias. This raises the question of the generalizability of the ML algorithm when using various methods to estimate GFR. To address this, it is important to recall that eGFR is the least impactful predictive variable in terms of prediction accuracy. Additionally, in the testing data set, eGFR was estimated solely by the Cockcroft-Gault formula, yet the ML algorithms still yielded satisfactory results. To illustrate this discussion, let's consider a hypothetical scenario: A 50-year-old patient weighing 60 kg, with a plasma creatinine level of 70 µmol/L, who has ganciclovir concentrations of 5.20 mg/L at C4h and 1.14 mg/L at T12h while on a 900 mg/24 h VGCV regimen. CrCL is estimated at 94.9 ml/min using the Cockcroft-Gault formula and 104.8 ml/min/1.73 m2 using the CKD-EPI formula. The AUCss estimated for this patient is 54.0 mg*h/L with Cockcroft-Gault and 53.0 mg*h/L with CKD-EPI. This example demonstrates that AUCss predictions remain consistent between the two methods of eGFR, confirming that the ML algorithm is robust and generalizable regardless of the method used to estimate GFR.
In conclusion, we have developed and validated XGBoost ML algorithms for accurate estimation of GCV inter-dose AUCss. These algorithms were evaluated on both simulated patient data and real patient data. Although this tool requires further validation, it simplifies MIPD by requiring fewer samples than traditional methods. This advancement could facilitate the assessment of the relevance of GCV TDM in clinical studies and contribute to the development of future recommendations for its implementation in routine practice.
Comments (0)