We obtained 171 lower-grade gliomas (LGG) patients receiving temozolomide (TMZ) chemotherapy from the CGGA database as a training set to screen for genes associated with TMZ treatment prognosis (Fig. 1A). Meanwhile, we obtained 65 lower-grade gliomas (LGG) patients receiving temozolomide (TMZ) chemotherapy from the CGGA database as an independent validation set (Supplementary Fig. 1). There is some relevant basic clinical information such as histology, grade, gender, age, overall survival, IDH_mutation_status, 1p19q_codeletion_status, and MGMTp_methylation_status of patients in the training set (Supplementary Table 1) and validation set (Supplementary Table 2).
Fig. 1
Analysis of genes related to the prognosis of LGG patients after TMZ treatment. A A total of 171 patients were used to analyze factors associated with TMZ prognosis in lower-grade gliomas. B Association of patient clinical information with TMZ outcome. Univariate Cox regression is used for analysis and significant associations are represented in red. C Genes associated with TMZ prognosis after correction for clinically relevant variables. Analyses were performed using multivariate COX regression, and significant protective and risk factors are shown in blue and orange, respectively. D Functions associated with protective and risk genes, and blue and orange represent protective and risky functions, respectively. E Distributions of associations between protective, risk, and other genes. Blue, orange, and gray represent protective, risk, and other genes, respectively. F Degree distribution in PPI networks formed by protective and risk genes. The blue and orange lines represent protective and risky PPI networks, respectively. G Differences in betweenness centrality of nodes between PPI networks formed by protective and risk genes. H Differences in closeness centrality of nodes between PPI networks formed by protective and risk genes
Univariate Cox regression showed that tumor stage, radiation therapy status, IDH1 mutation status, and chr1p19q co-deletion status had an impact on overall survival (Fig. 1B, Supplementary Fig. 2). Therefore, we used Cox regression to correct these clinical variables to obtain genes associated with the prognosis of TMZ treatment. A total of 488 risk genes and 128 protective genes passed the screening threshold (p < 0.01; Fig. 1C, Supplementary Table 3). Functional enrichment of these genes revealed that protective genes were associated with nervous system development, while risk genes were associated with multiple pathways associated with tumor progression including cell cycle, immune response, EMT, and extracellular matrix organization (Fig. 1D, Supplementary Table 4).
We further analyzed the internal correlations of risk genes, protective genes, and non-significant genes, and found that the overall correlations of protective genes and risk genes were higher than those of non-significant genes, suggesting that these genes formed specific functional modules (Fig. 1E). At the same time, the overall correlation of protective genes was significantly higher than that of risk genes, suggesting that risk genes were involved in multiple functions. In addition, we analyzed the protein network formed by risk and protective genes respectively. The degree distribution of the two networks showed that the network of risk genes reflected multi-modularity (Fig. 1F). Moreover, the centrality measures of the two networks are different in distribution. The mean betweenness centrality of the network of risk genes was low (Fig. 1G), and the mean closeness centrality was high (Fig. 1H). Network module analysis revealed that risk genes formed multiple functional modules, including lipid metabolism, cell cycle, interferon response, and Wnt signaling pathway (Supplementary Fig. 3, Supplementary Table 5). These results suggest that TMZ prognostic risk genes are involved in various carcinogenic processes.
3.2 Development of risk score associated with the benefit of TMZ treatment in LGGWe used consensus LASSO Cox regression to screen key genes for prognostic genes and repeated 1000 iterations to obtain the occurrence frequency of each gene (Fig. 2A), of which 14 genes appeared more than 950 times. We then included high-frequency genes in the model in turn and used a tenfold cross-validation method to evaluate the prognostic model performance by the area under time-dependent ROC at varying follow-up times. The results showed that the prognostic model had the highest predictive power when predicting 3 year survival, while its predictive power was relatively poor for 1 year survival (Fig. 2B). Based on the overall prediction of different survival conditions, we selected a model consisting of 14 genes. For these 14 genes, we used Cox regression to determine the contribution of these genes based on a tenfold cross-validation method. These features, ranked by median weight, included PAX3 (1.06), ADM2 (0.76), GRB14 (0.44), CYP4F12 (0.37), HELZ2 (0.24), RGS16 (0.19), IL13RA2 (0.15), CHRNA1 (0.06), IGF2BP3 (−0.08), HOXA4 (−0.15), IL34 (−0.28), KCNIP3 (−0.32), CACNA1H (−0.33), and GNAL (−0.52) (Fig. 2C). Finally, the cumulative sum of the product of gene weight and corresponding expression was used as the TMZ prognosis score.
Fig. 2
Development of risk score associated with the benefit of TMZ treatment in LGG. A The occurrence frequency of each gene in consensus LASSO Cox regression. B The area under time-dependent ROC for predicting survival of different gene models (GM) in the training set. C The weight of each gene across the tenfold cross-validation. Boxes show median, first, and third quartile, with whiskers extending at most 1.5 interquartile range, and the red Cross indicates the represent the estimated weights for each gene. D Association of risk scores with patient survival status in the training set. E Differences in survival time between groups with high and low-risk scores in the training set. F AUC values for risk scores predicting survival in the training set. G Association of risk scores with patient survival status in the validation set. H Differences in survival time between groups with high and low-risk scores in the validation set. I AUC values for risk scores predicting survival in the validation set
After multivariable adjustment by clinicopathological variables including grade, radio status, IDH mutation status, and chr1p19q co-deletion status, the risk score remained a powerful and independent factor in the training cohort (HR 2.73, 95% CI 2.09–3.58, p < 0.001; Fig. 3A). We also noted similar results in the independent validation cohort (HR 1.28, 1.05–1.55, p = 0.013; Fig. 3B). The IDH1 mutation status and chr1p19q co-deletion status are key genomic markers related to the prognosis of glioma treatment. When stratified by IDH1 mutation status, the risk score was still a clinically and statistically significant prognostic model in the train and independent validation cohort (Fig. 3CD). We got a similar conclusion, in which patients with chr1p19q wild-type and the high-risk score had the worst survival when patients were stratified according to the co-deletion status of chr1p19q (Fig. 3EF). We also focused on the prognostic effect of tumor stage on risk score, and the risk score can distinguish patients with different stages from two groups with significant differences in survival (Fig. 3GH). Further, we explored the prognostic performance of risk scores in glioblastoma (GBM). Although there are certain differences between LGG and GBM in the genome and other aspects, we found that the risk score still showed certain prognostic ability, and patients with lower risk scores tended to have better survival benefits (p = 0.062 in train data sets and p = 0.015 in independent validation sets; Fig. 3IJ).
Fig. 3
The prognostic power of risk scores was independent of clinicopathological factors. A Multivariate Cox analysis of risk scores with multiple clinical factors in the training set. B Multivariate Cox analysis of risk scores with multiple clinical factors in the validation set. C Effect of IDH status combined with risk score on survival in the training set. D Effect of IDH status combined with risk score on survival in the validation set. E Effect of chr1p19q co-deletion status combined with risk score on survival in the training set. F Effect of chr1p19q co-deletion status combined with risk score on survival in the validation set. G Effect of WHO grade combined with risk score on survival in the training set. H Effect of WHO grade combined with risk score on survival in the validation set. I Differences in survival curves between high and low-risk scores in GBM in CGGA1 datasets. J Differences in survival curves between high and low-risk scores in GBM in CGGA2 datasets
3.3 The risk score was related to the immune cell infiltration of LGGWe then analyzed the relationship between risk score and the immune microenvironment. We use a variety of algorithms including ESTIMATE, XCELL, and CIBERSORT to measure the composition of the tumor's immune microenvironment. We first analyzed the association of key genes in the model with tumor purity, immune score, stromal score, and different immune cell infiltration proportion. The HOXA4, CYP4F12, and ADM2 were positively correlated with the immune and stromal score but negatively correlated with tumor purity (Fig. 4A). The immune cells that are positively associated with these genes include macrophages, naïve CD8 + T cells, and activated dendritic cells (Supplementary Fig. 4, Fig. 4B).
Fig. 4
The risk score was related to the immune cell infiltration of LGG. A The relation between genes in the model and immune score from the ESTIMATE method. B The relation between genes in the model and immune score and percentage of immune cell infiltration from the Xcell method. C Differences in tumor purity, immune score, and stromal score between high and low-risk groups. D Differences in immune cell infiltration between high and low-risk groups. Red represents the high-risk score group and blue represents the low-risk score group
The GNAL, on the other hand, was negatively correlated with the immune score, stromal score, and some myeloid lymphocyte infiltration (Fig. 4AB). We then assessed differences in immune microenvironments between high and low-risk groups. Patients with high-risk scores had higher immune and stromal scores and lower tumor purity (Fig. 4C), indicating that unfavorable prognosis in the high-risk group may be associated with the variation in the tumor immune microenvironment. Further immune cell infiltration analysis showed that the activated dendritic cells, M1 type macrophages, B cells, and Th2 subset of CD4 + T cells were higher in patients with high-risk scores, while the B cell plasma and naïve CD8 + T cells were lower in patients with high-risk score (Supplementary Fig. 5, Fig. 4D). These results indicated that high-risk patients showed activation of innate and humoral-type immunity.
3.4 Development of a nomogram for predicting the benefit of TMZ treatment in LGGMed well in predicting patient survival according to an ideal model in training and independent validation sets (Fig. 5BC). To evaluate the clinical benefit and better predict the prognosis of LGG patients after receiving TMZ therapy in the clinic, a prognostic nomogram was developed by integrating risk score and three independent predictors of mortality including grade, IDH1 mutation status, and chr1p19q co-deletion status from the above analyses into a multivariate Cox regression model (Fig. 5A). The calibration plot showed that the nomogram perforators of the nomogram model, we performed decision curve analysis in both the training and independent validation sets. With 3 year survival as the endpoint, the curve showed that the nomogram presented more clinical net benefits than several competing intervention strategies, namely, intervention for all, intervention for none, and intervention based on different clinical indicators (Fig. 5DE).
Fig. 5
Nomogram of risk score. A A nomogram based on risk score, grade, IDH status, and chr1p19q co-deletion status. B Calibration curves of nomogram in the training set. C Calibration curve analysis of nomogram in the training set. D Decision curves of nomogram in the validation set. E Decision curve analysis of nomogram in the validation set
Comments (0)