In multicellular systems, spatial organization is not only the external form that maintains cellular order but also a fundamental determinant that shapes cell differentiation trajectories and functional development processes (Tirosh et al., 2016). During development, precise spatial compartmentalization ensures correct differentiation and the proper formation of tissues and organs, which is critical for growth and functional maintenance. Rational spatial arrangements enable efficient intercellular signaling and communication, guarantee the orderly distribution of nutrients and energy, and allow rapid responses to environmental changes, thereby maintaining the organism's dynamic equilibrium and homeostasis. In pathological states, however, the disruption of spatial organization is often accompanied by functional impairment; for example, in cancers, uncontrolled cell proliferation and structural disorganization can compromise tissue function and trigger systemic health problems (Egeblad et al., 2010). Thus, a deep understanding of spatial organization not only helps uncover the fundamental principles of multicellular biology but also provides theoretical insights into human health and disease mechanisms (Vandereyken et al., 2023; Duan et al., 2024).
With the continuous advancement of high-throughput sequencing technologies, single-cell sequencing has greatly accelerated research in cell biology and disease mechanisms (Stark et al., 2019). However, a single omics layer is often insufficient to fully capture cellular states and functions. The emergence of single-cell multi-omics seeks to integrate multi-modal information, thereby providing a more comprehensive cellular characterization (Li et al., 2023; Zhang et al., 2022; Zhang et al., 2025a). Different omics modalities offer complementary strengths for cell type identification, facilitating deeper exploration of molecular mechanisms underlying health and disease. Nevertheless, single-cell multi-omics techniques typically require cell dissociation, resulting in the loss of spatial context and the decoupling of molecular expression changes from spatial neighborhoods and intercellular interactions (Zhang et al., 2021). Spatial transcriptomics overcomes this limitation. Existing spatial transcriptomic technologies can be broadly categorized into two major classes: sequencing-based and imaging-based approaches. Sequencing-based platforms and technologies, including ST (Ståhl et al., 2016), 10x Visium (Lewis et al., 2021), Slide-seq (Rodriques et al., 2019), and Stereo-seq (Chen et al., 2022), typically capture mRNA molecules from tissue sections using spatially barcoded arrays or fluorescent encoding strategies. The captured transcripts are subsequently reverse-transcribed and quantified through high-throughput sequencing, enabling genome-wide gene expression profiling. As a result, sequencing-based spatial transcriptomic data currently represent the most widely used input for computational method development. In contrast, imaging-based platforms such as MERFISH (Chen et al., 2015), seqFISH+ (Eng et al., 2019), CosMx SMI (Wu et al., 2024), and Xenium (Huynh et al., 2025) rely on iterative rounds of in situ hybridization, fluorescence imaging, or in situ amplification to directly decode the spatial locations of individual mRNA molecules under a microscope, achieving single-cell or even subcellular resolution. Owing to these fundamental differences, the two technological paradigms exhibit substantial divergence in capture mechanisms, spatial resolution, gene throughput, noise characteristics, and data representations, together constituting the core technological landscape of contemporary spatial transcriptomics. More recently, spatial multi-omics technologies have further advanced by enabling simultaneous acquisition of multiple molecular modalities within the same tissue section. This provides multi-dimensional and complementary views at each spot, uncovers discrepancies between transcriptomic and proteomic expression, highlights the crucial roles of metabolism and epigenetic modifications in regulating cell states, and facilitates in-depth investigations of the relationship between tissue structure and function (Vandereyken et al., 2023).
In recent years, the rapid development of spatial multi-omics platforms and technologies (e.g., SPOTS (Ben-Chetrit et al., 2023), MISAR-seq (Jiang et al., 2023), Stereo-CITE-seq (Liao et al., 2023), Spatial CUT&Tag-RNA-seq (Li et al., 2025a)) and the explosive growth of data have given rise to substantial analytical challenges. First, different omics datasets exhibit substantial differences in distribution, scale, and noise characteristics, and naïve concatenation often leads to information redundancy or even the loss of critical signals. Second, spatial multi-omics data are typically characterized by pronounced sparsity and technical noise, although their manifestations vary substantially across different modalities and detection technologies. For instance, spatial transcriptomic data commonly exhibit zero inflation (Shang and Zhou, 2022; Covert et al., 2023; Miao et al., 2021; Cui et al., 2025a; Li et al., 2024a). In spatial proteomics, mass spectrometry–based approaches are often constrained by limited sensitivity for low-abundance proteins (Breckels et al., 2024; Mund et al., 2022; Jiang et al., 2024; Qian et al., 2006), whereas antibody-based multiplexed immunoassays (such as CITE-seq or multiplex immunofluorescence) more frequently encounter challenges related to antibody specificity, background signals, and restricted panel sizes (Mou et al., 2022; Black et al., 2021). Furthermore, although different modalities provide complementary information, their relationships are often nonlinear and dynamic, making them difficult to model accurately with conventional statistical methods. More critically, under current experimental conditions, data quality and signal-to-noise ratios across different modalities within the same sample are often highly imbalanced. If one modality is over-weighted or excessively aligned during data fusion, the resulting joint latent representation may become dominated by that modality, thereby diminishing the effective retention and contribution of information from other modalities and ultimately compromising the reliability and robustness of the integration results. Consequently, the development of spatial multi-omics integration algorithms based on deep learning has emerged as a key methodological paradigm for comprehensive interpretation of cellular heterogeneity and functional states (Coleman et al., 2024; Walsh and Quail, 2023; Gao et al., 2025; Luo et al., 2025; Fu et al., 2025; Cui et al., 2025b). To address these challenges, numerous deep learning–based methods have emerged since the introduction of SpatialGlue (Long et al., 2024), with a central focus on multi-omics fusion (Zhou et al., 2025; Li et al., 2025b; Chen et al., 2025a; Cai et al., 2025) and representation learning (Tian et al., 2024; Coleman et al., 2025; Yao et al., 2025; Yan et al., 2024).
In spatial multi-omics studies, experimental designs can generally be categorized into same-section and serial-section strategies, depending on how different omics modalities are acquired. Same-section designs measure multiple molecular modalities simultaneously or sequentially on the same tissue section, enabling intrinsic spatial alignment across modalities at the coordinate level and thus being more suitable for longitudinal spatial multi-omics integration at the same spatial units. In contrast, in serial-section designs, different omics data are obtained from adjacent or distinct tissue sections and typically require additional cross-section registration and mapping procedures, rendering the integration process closer to a “diagonal” integration across spatial units. As illustrated in Fig. 1, the focus of this review lies in spatial multi-omics integration as a core research direction within spatial biology. The overarching goal is to jointly model multi-layer molecular information derived from the same tissue section while preserving spatial tissue architecture, thereby learning unified and interpretable latent representations that facilitate the characterization of cellular heterogeneity, cross-modality regulatory relationships, and potential dynamic trajectory evolution within tissue microenvironments (Long et al., 2024; Liu et al., 2025; Miao et al., 2025; Argelaguet et al., 2020; Ashuach et al., 2023). It is worth noting that most existing and widely adopted longitudinal spatial multi-omics integration methods are primarily designed for spot-level spatial data and typically concentrate on the joint measurement of two omics modalities, such as transcriptomics–proteomics or transcriptomics–epigenomics. Against this backdrop, this review specifically focuses on such longitudinal spatial multi-omics integration scenarios and provides a systematic overview and analysis of the associated computational models and methodologies.
In this review, we provide the first systematic and comprehensive survey of spatial multi-omics integration algorithms, with particular attention to methods developed to integrate multi-modal molecular measurements collected from the same tissue section. First, we systematically reviewed the datasets currently used by spatial multi-omics integration algorithms. We then categorize and compare existing algorithms from two perspectives: (i) Algorithmic frameworks, encompassing Generative and Probabilistic Frameworks, Graph-based Representation Learning Frameworks, and Matrix Factorization and Hybrid Frameworks; (ii) Spatial multi-omics fusion strategies, including Concatenation-based Fusion, Attention-guided Fusion, and Subspace Decomposition Fusion. Subsequently, we summarize the downstream tasks of spatial multi-omics integration, which not only demonstrate the methodological utility but also provide concrete biological objectives for advancing multi-omics research. Finally, we analyze the existing problems and challenges in current spatial multi-omics studies.
Comments (0)