Decision tree models combining Bi-parametric vesical imaging reporting and data system and apparent diffusion coefficient metrics for predicting muscle-invasive bladder cancer

The results obtained from ADC histogram analysis showed that lower ADC percentile values were associated with lower p-value in differentiating MIBC and NMIBC. In addition, ROC curve analysis demonstrated that the 25th percentile ADC achieved the highest AUC among the histogram-derived parameters, although pairwise AUC comparisons showed no significant difference between the 25th and 50th percentile ADC values. Moreover, lower-percentile histogram-derived ADC values have been reported to correlate with tumor proliferative activity in other malignancies, such as breast cancer [23]. Considering this biological relevance, we used the 25th percentile ADC for subsequent analyses. In contrast, previous studies have reported that all ADC percentile values demonstrate significant differences between the two groups [18]. Consistent with these findings, all percentile values in our study also showed statistically significant differences. Regarding skewness and kurtosis, our results agree with studies reporting significant differences in these parameters between MIBC and NMIBC [18, 20]. Another study has also shown that both skewness and kurtosis are significantly higher in tumors with higher histological grade [24]. In contrast, the interquartile range showed no significant difference between the two groups. These findings suggest that heterogeneity alone may not be sufficient for predicting muscle invasion and further suggest the relatively homogeneous ADC distribution of bladder cancer. Histogram analysis reflects the full ADC value distribution across the entire lesion, whereas the conventional ROI method derives a representative ADC value from visually selected ROIs and is therefore more practical for routine clinical practice. We demonstrated that the minimum mean ADC, obtained by averaging three visually identified low-ADC ROIs, also discriminated between MIBC and NMIBC. This is consistent with previous reports showing that minimum ADC is a useful parameter for differentiating the two groups [21]. In our study, the minimum mean ADC was also found to be close in value to the 25th percentile ADC derived from histogram analysis, which is noteworthy. Because histogram analysis is time-consuming in clinical settings, the use of minimum mean ADC, which approximates the 25th percentile ADC, may represent a feasible alternative.

The distributions of histological type and grade differed significantly between MIBC and NMIBC in our cohort, and these findings are consistent with previous reports [25,26,27]. In bladder cancer, tumors with higher Ki-67 labeling indices, a marker of tumor cell proliferative activity, more frequently exhibit WHO high-grade or G3 histology, and a significant positive correlation between Ki-67 and histological grade has been reported [28]. In other tumor types, such as breast cancer and brain tumors, an inverse correlation between Ki-67 and ADC values has been reported [23, 29]. Similarly, bladder cancers with higher grade or invasive histological characteristics might show higher Ki-67 expression and lower ADC values, which may underlie the ability of ADC measurements to distinguish between MIBC and NMIBC [30]. Further investigation into the relationship between ADC and pathological factors would therefore be needed.

Nevertheless, several factors influence ADC measurement. ADC values can be affected by noise, geometric distortion, and differences in scanners and b-values [31]. Several studies have used b-values of 0 and 800 s/mm² [18, 32], whereas others have used 0 and 1000 s/mm² [33, 34], and our study also employed b-values of 0 and 1000 s/mm². Notably, the original VI-RADS paper by Panebianco et al. lists recommended b-values ranging from 800 to 1000 to 2000 s/mm², indicating that no standardized b-value has been established for ADC measurement [1]. To reduce these influences, some studies have proposed using normalized ADC, standardized to the muscle, for predicting muscle invasion [19, 35]. Techniques such as turbo spin echo-DWI and reverse encoding distortion correction DWI have also been introduced to decrease geometric distortion associated with conventional EPI-DWI [36, 37]. Although the present study did not use normalization or specialized acquisition methods, improving the consistency of ADC measurements through standardization and optimized imaging protocols might be an important area for future work. In addition, our DWI/ADC acquisition used a slice thickness of 5 mm with a 1 mm interslice gap, which may have introduced partial-volume effects and influenced ADC measurements.

Studies examining the combination of VI-RADS and ADC values have been reported for both multiparametric and bp VI-RADS, using logistic regression models [22]. Such approaches require the application of regression coefficients to VI-RADS scores and ADC parameters, which may limit their practicality in daily clinical use. In the present study, in addition to logistic regression analysis, we developed a decision tree model integrating bp VI-RADS with ADC values. The 25th percentile obtained from histogram analysis and the minimum mean ADC derived from the conventional ROI method were incorporated into the models, respectively. In the logistic regression model, combining bp VI-RADS with either the 25th percentile or the minimum mean ADC resulted in favorable diagnostic performance. In contrast, the decision tree model demonstrated lower sensitivity and accuracy compared with logistic regression, yet achieved higher specificity, reaching 0.89. When implemented within the decision tree framework, the 25th percentile and the minimum mean ADC showed comparable diagnostic performance. Considering interpretability and feasibility in clinical settings, the decision tree model based on the minimum mean ADC may be particularly suitable for practical use. In our study, interobserver agreement for bp VI-RADS was in the moderate range (weighted κ = 0.64) (Supplemental Material 2), which was somewhat lower than that previously reported (κ ≥ 0.90) [12]. This difference may be related to differences in reader experience.

In conventional bp VI-RADS–based workflows, category 4 lesions are generally regarded as suspicious for muscularis invasion, which may prompt additional invasive evaluation or consideration of treatment escalation. Our proposed decision tree incorporates ADC measurement only for bp VI-RADS 4 lesions and may help identify patients who warrant prompt re-TUR or treatment escalation, while reducing unnecessary treatment escalation in lesions that are ultimately NMIBC. It is noteworthy that both the 25th percentile and the minimum mean ADC yielded decision trees with a similar overall structure, in which additional branching based on ADC values was required specifically for lesions categorized as bp VI-RADS 4. The bp VI-RADS 4 is defined as an “interruption of the low signal intensity line suggesting extension into the muscularis layer.” In clinical practice, however, distinguishing between true muscularis invasion and apparent interruption caused by stretching or deformation of the bladder wall can be challenging. Incorporating a classification step based on the minimum mean ADC, reflecting intrinsic tumor characteristics rather than bladder wall morphology, appears to reinforce the bp VI-RADS assessment within the decision tree. In fact, compared with bp VI-RADS as a single parameter, the decision tree model combining bp VI-RADS and the minimum mean ADC improved specificity from 0.77 to 0.89 and PPV from 0.69 to 0.83. An improved PPV is clinically meaningful, as it may support timely and appropriate therapeutic decisions for patients with true muscle invasion, including re-TUR, radical cystectomy, or intravesical chemotherapy. In addition, measuring the minimum mean ADC only for lesions classified as bp VI-RADS 4 requires less effort than obtaining ADC measurements for all cases, making this approach more feasible for daily clinical practice. This study compared logistic regression and decision tree models within a dataset of 82 cases, but external validation was not performed. To facilitate the clinical adoption of the decision tree model, evaluation in an independent validation cohort will be necessary.

Tissue sampling by TUR-BT is known to underestimate muscle invasion because the muscular layer is sometimes not included in the specimen [38]. In fact, in this study, 16 cases in which the presence or absence of muscle invasion could not be determined on pathological examination were excluded from the study population. In our study, the presence or absence of muscle invasion was based on TUR-BT specimens, and cystectomy specimens were not used. Therefore, misclassification of MIBC and NMIBC cannot be excluded, and this should be regarded as a major limitation of the present study. This approach is a limitation because cystectomy specimens are regarded as more reliable for evaluating the depth of invasion. However, cystectomy is usually performed after chemotherapy or re-TUR-BT when muscle invasion remains suspected. For this reason, using cystectomy specimens to compare VI-RADS and ADC with muscle invasion status is difficult to apply in a suitable study design.

This study has several other limitations. First, the sample size was small, and the accuracy of the decision tree model could not be assessed using an independent validation dataset. Given the susceptibility of decision tree models to overfitting, the lack of internal or external validation is an important limitation and may affect the generalizability of the proposed model. Second, in patients with multiple lesions, only the largest lesion was evaluated on imaging, and a strict one-to-one correspondence with the lesion sampled by TUR-BT could not be ensured. Third, patients in whom the lesion could not be identified on MRI were excluded. Among these seven patients, one was later confirmed to have muscle invasion on TUR-BT. This suggests that the exclusion may be related to limitations in the spatial resolution of MRI, indicating that further improvement in the spatial resolution of both bp VI-RADS assessment and DWI would be desirable. Fourth, whole-lesion segmentation and ROI placement were performed manually, and inter-reader reproducibility of these procedures was not quantitatively assessed. Although excellent agreement was observed for the derived ADC metrics (ICCs > 0.9), future studies using automated or semi-automated segmentation and overlap-based metrics, such as Dice similarity coefficients, may help further improve reproducibility and generalizability. Fifth, in this study, DWI/ADC was acquired with a slice thickness of 5 mm and an interslice gap of 1 mm, which were not fully consistent with the VI-RADS technical recommendations for 3 T MRI. In addition, ADC measurements may vary depending on acquisition and post-processing factors, such as scanner differences, b-values, and the use of normalization or distortion correction. These differences may have influenced bp VI-RADS assessment and ADC measurements and may limit the generalizability of the proposed cutoff values across different institutions.

In conclusion, whole lesion histogram analysis identified the 25th percentile ADC as a useful parameter for assessing muscle invasion, and the minimum mean ADC from a representative slice showed similar value as a simpler method. In our decision tree framework, ADC measurement provided additional value primarily for lesions classified as bp VI-RADS 4, where visual assessment of muscularis invasion is often equivocal. Incorporating ADC in this subset improved specificity and PPV compared with bp VI-RADS alone.

Comments (0)

No login
gif