Tensor decomposition reveals trans-regulated gene modules in maize drought response

Plant drought tolerance is a complex quantitative trait affected by dynamic physiological and molecular processes that can be reflected by changes in gene expression. Drought resistance in maize is influenced by the expression of specific genes such as ZmNAC111 (Mao et al., 2015), ZmMPKL1 (Zhu et al., 2020), and ZmVPP1 (Wang et al., 2016). To date, transcriptomic comparison and expression quantitative trait locus (eQTL) analyses have been utilized to study the drought response of maize. Differential expression analyses of a few inbred lines under different drought conditions, e.g., different water regimes or at different time points after drought exposure, have revealed thousands of drought-induced genes and potential biological processes in response to drought (Zheng et al., 2010; Danilevskaya et al., 2019; He et al., 2020). eQTL studies have revealed the dynamic genetic basis of expression variation in drought-responsive genes at the population scale (Lowry et al., 2013; Liu et al., 2020). Drought-responsive regulatory networks based on transcription factor (TF) target pairs and drought-specific regulatory hotspots have been identified (Sun et al., 2022). Despite advancements in molecular and quantitative genetics, a systematic understanding of gene regulatory networks in response to drought stress in maize remains limited, necessitating a more thorough investigation.

Trans-acting eQTL (trans-eQTL) is a type of genetic variant that potentially regulates gene expression through a distal regulator, which plays an important role in drought-responsive gene expression (Lowry et al., 2013; Clauw et al., 2016). Although the regulatory effect is weak (Fu et al., 2013; Võsa et al., 2021), the cumulative effect of trans-eQTLs dominates the majority of the trait heritability (Liu et al., 2019). However, given the intrinsically small effect size and extra statistical burden due to millions of hypothesis tests, it is difficult to detect trans-eQTLs via single-nucleotide polymorphism (SNP)-by-gene tests (Mackay et al., 2009; Liu et al., 2019). In addition, as gene expression is not completely independent, especially for genes belonging to the same regulatory network (Sieberts and Schadt, 2007), standard eQTL mapping in an SNP-by-gene manner inevitably neglects the complex correlation structure in gene expression. A framework combining module detection methods and eQTL mapping has been proposed to address these problems (Biswas et al., 2008). Instead of analyzing individual genes separately, this framework focuses on gene modules, which are groups of genes typically involved in functionally interconnected processes that are co-regulated (Kong et al., 2008; Saelens et al., 2018; Stein-O’Brien et al., 2018). By benefiting from a decrease in the number of tested molecular phenotypes and a reduction in expression noise via model fitting, module detection can enhance trans-eQTL identification, including reducing the testing time and uncovering novel loci with broad effects. This strategy has been successfully applied to investigate the genetic variants of external stimulus responses in human cells (Ahern et al., 2022; Kolberg et al., 2020; Ramdhani et al., 2020).

Developing effective methods for identifying biologically relevant genes is an active area of high-throughput omics. Weighted gene co-expression network analysis (WGCNA) is a prominent clustering-based method for transcriptome analysis (Langfelder and Horvath, 2008). However, this method fails to assign genes to multiple modules and potentially overlooks local co-expression patterns existing in a subset of samples. These limitations can be alleviated by decomposition counterparts, such as independent component analysis (ICA) (Hori et al., 2001) and singular value decomposition (Alter et al., 2000), which aim to extract latent components underlying biological processes. Based on the relative contribution of the genes to each component, genes belonging to the corresponding module were inferred. Nevertheless, most methods have been created for two-way datasets, with each data point indexed to one gene of one individual. In the big data era, studies have extended the third way by collecting data under different experimental conditions or from different types of tissues (Gibson, 2015; Liu et al., 2020, 2022; Teng et al., 2024). Several methods that enable the detection of latent structures have emerged to address data with three-way variations (Hore et al., 2016; Wang and Song, 2017; Wang et al., 2019). As a third-order tensor decomposition method utilizing a sparse prior, sparse decomposition of arrays (SDA) can recover gene loadings to a certain degree of sparsity and consequently allow for the clear inference of genes with shared expression patterns (Hore et al., 2016).

Regulatory variations in maize largely account for the causative variants (Chen et al., 2021), making it a useful system for investigating dynamic regulatory changes during drought responses. In this study, we applied SDA to published transcriptome datasets of maize leaves under three water regimes with three sources of variation: inbred lines, genes, and experimental conditions (Liu et al., 2020), and identified transcriptional modules consisting of genes with shared expression patterns. Gene modules and the corresponding trans-eQTLs associated with drought responses were identified. We also explored the population differentiation of trans-eQTLs during the maize domestication and improvement processes. Importantly, motif analyses, chromatin immunoprecipitation sequencing (ChIP-seq) data, and transient expression assays provided additional support for the regulatory relationships linked to trans-eQTLs underlying TFregulators.

Comments (0)

No login
gif