Association between testosterone and cancers risk in women: a two-sample Mendelian randomization study

2.1 Data collection

In the assessment of testosterone levels in women, both Total Testosterone (TT) and Bioavailable Testosterone (BT) are used as indicators for the physiological effects of testosterone. Moreover, we defined the ten cancers using the ICD-10 codes (Supplementary Table 1). For the purpose of this study, individuals of European ancestry were recruited for inclusion. Pertinent data related to individual samples, such as sample size, demographic characteristics, and the number of relevant single nucleotide polymorphisms (SNPs), can be found in Supplementary Table 2. Summary statistics for multiple samples were sourced from the Medical Research Council Integrative Epidemiology Unit Open GWAS project (https://gwas.mrcieu.ac.uk/). Due to our exclusive use of deidentified data, institutional review board approval was not required for our analysis.

Data concerning Total Testosterone (TT), Bioavailable Testosterone (BT), cervical cancer, endometrial cancer, bladder cancer, and thyroid cancer were exclusively sourced from the MRC IEU Open GWAS database. Liver and bile duct cancer data was obtained from the Genome-wide Association Study of Cancer Risk in the UK Biobank. Publicly available summary statistics from the FinnGen consortium (www.finbb.fi) provided data on kidney cancer and stomach cancer. Additionally, Neale Lab contributed data on skin cancer. Finally, data on breast cancer and ovarian cancer were respectively obtained from the Breast Cancer Association Consortium (BCAC) and the Ovarian Cancer Association Consortium (OCAC). The GWAS datasets for 10 different types of cancer were presented in Table 1.

Table 1 The GWAS datasets for 10 different types of cancer2.2 Instrumental variable selection

In this study, we utilized the genetic variants (TT and BT) associated with exposures as instrumental variables (IVs) satisfying three assumptions, namely: (1) the genetic variants are associated with exposures, (2) the genetic variants solely influence outcomes through the exposures, and (3) the genetic variants are unrelated to any confounding factors affecting the exposure-outcome association [13]. We identified SNPs that exhibited significant associations with the exposures (P < 5E-8) and ensured their independence by assessing linkage disequilibrium (LD). To eliminate LD among the SNPs, clumping process was implemented with R2 < 0.001 and kb = 10,000. The F statistics were calculated using the formula (F = β2/SE2) for each SNP separately [14, 15], Subsequently, we confirmed that the F statistic exceeded 10 in order to mitigate any potential effects of weak instrumental variable bias. Figure 1 depicted the study design through a study framework diagram. The PhenoScanner database (http://www.phenoscanner.medschl.cam.ac.uk) was employed to scrutinize all SNPs. Supplementary Tables 3 and 4 respectively displayed the characteristics of IVs associated with BT and TT, excluding the influence of confounding factors such as body mass index, smoking, diabetes, and the use of blood pressure medication. Supplementary Tables 5 and 6 presented the distinct IVs selected for BT and TT in conducting MR analysis to assess the risk of multiple cancers in cancer-specific GWAS.

Fig. 1figure 1

The study design and three fundamental assumptions in the Mendelian randomization analysis

2.3 Mendelian randomization statistical analysis

The Wald ratio approach was employed to estimate the impact of a single SNP closely linked with exposure on the outcome, with the inverse-variance weighted (IVW, random effects) serving as the primary analytical method. Supplementary analyses were conducted using MR-Egger, weighted median, simple mode, and weighted mode. When all included SNPs are effective instrumental variables, the IVW method was utilized for precise estimation [16]. The MR-Egger method adjusted for pleiotropy, yet the resulting associations tend to be less precise [17]. The weighted median model is a valid estimator when at least 50% of the weight is derived from appropriate instrumental variables [18]. Heterogeneity and horizontal pleiotropy were assessed using Cochrane's Q and MR-Egger intercept, respectively [17]. In cases where there is no excess heterogeneity, the random-effects IVW models and the fixed-effect IVW model yield equivalent results without a loss of precision. Conversely, random-effects IVW models are appropriate when heterogeneity is present [19]. If there is evidence of horizontal pleiotropy, the MR-PRESSO outlier test was applied [20]. Leave-one-out sensitivity analysis was additionally used to identify SNPs with influential effects and assess the reliability of the results [21]. To further demonstrate the reliability of MR results, maximum likelihood, penalized weighted median, and IVW (fixed effects) were utilized in the subsequent analyses. The maximum likelihood approach is a conventional method with low standard errors similar to IVW and, if the hypotheses are met, yields unbiased results with smaller standard errors than IVW [22].

All of our statistical tests were two-sided, and a P < 0.05 was considered statistically significant. All data analysis was implemented using R Studio 4.2.1 with the “Two-Sample-MR”, “MR-PRESSO” and “MendelianRandomization” packages for MR analysis.

Comments (0)

No login
gif