With the increasing maturity of high-throughput profiling technologies, survival time and various disease phenotypes have been associated with numerous genetic variables. Many statistical and machine learning methods have been developed to identify important genetic variables and construct predictive model [1], [2], [3], [4].
Although many interesting methods have been developed for biomedical research, the resulting models often prove practically unsatisfactory due to insufficient information, typically manifested in the high dimensionality of genetic data, weak signals, and limited sample sizes [5]. Many strategies have been proposed to “borrow” additional information. Examples include integration analysis [6], prior information [7], the network structure of gene expression [8], and pathological imaging data [9]. However, these approaches rely on accessible supplementary information and often require parametric assumptions, limiting the applicability.
An alternative is to borrow information from mixed outcomes measured on the same individuals through joint modeling, which leverages knowledge among outcomes. Many biomedical studies, such as genome-wide association studies, commonly involve relevant measurements of mixed outcomes for each individual. For instance, in the study of skin cutaneous melanoma (SKCM), the outcomes can involve the sample type, the overall survival time, and the Buffa Hypoxia Score. The sample type indicates whether the tumor is primary or metastatic, the overall survival time is a right-censored response variable, and the Buffa Hypoxia Score is a critical feature of the tumor microenvironment (TME). Many studies discuss these phenotypes separately. However, melanoma is known to be characterized by a high rate of metastasis [10]. The Buffa Hypoxia Score, a continuous variable, influences metastatic spread as well as other phenotypes. Related studies also suggest that TME significantly affects survival and tumor cell invasion [11], [12]. Modeling them separately leads to a lack of information, as it fails to utilize the correlations among outcomes. In practice, researchers often need to model these outcomes jointly to gain comprehensive insights into the underlying mechanisms among outcomes [13]. However, this poses difficulties since these correlated outcomes include not only discrete and continuous response variables but also survival time variables.
In statistical analysis, joint modeling of mixed outcomes primarily focuses on discrete and continuous variables but often struggles to effectively address problems involving survival time. Most approaches are motivated by the difficulty of describing the joint multivariate distribution of mixed variables. We refer to some strategies built on latent variable models [14], [15], [16], copula methods [17], [18], [19], conditional distribution assumptions [20], [21], quasi-likelihood methods [22], [23], and others. All these strategies have their pros and cons. For example, latent variables provide shared information among outcomes to aid analysis. However, they impose additional assumptions and fail to efficiently leverage information when survival time response variables are involved.
Neural networks, particularly multi-task learning (MTL), offer a flexible non-parametric alternative by jointly modeling related tasks using shared layers [24], [25]. Its loss function are often formulated as a linear weighted summation of individual task losses1 [26]. Yet, existing neural MTL methods face two major difficulties in mixed-outcome settings: First, gradient imbalance in computation leads to training difficulties, particularly in the case of mixed outcomes. Gradient imbalance occurs when the dominant task skews the backpropagation process, making the gradients disproportionately beneficial to itself. As such, the information provided by other outcomes can be obscured and overshadowed. Chen et al. [27] point out that multi-task networks are difficult to train when tasks are imbalanced, as the dominant task tends to generate imbalanced gradients during backpropagation. Even assigning weights to individual losses does not fully resolve the computational imbalance and leads to suboptimal optimization. Thus, tasks need to be properly balanced so that the shared parameters of the network can be updated stably. Second, formulating the loss function as a linear weighted sum overlooks the fact that losses for each task are measured using different metrics. For example, the least squares criterion in regression measures the empirical distance between a continuous response variable and its prediction. In contrast, cross-entropy in classification evaluates the divergence between two probability distributions. For survival data, loss functions vary depending on the model used, such as weighted least squares loss and negative partial likelihood, incorporating censoring information. Therefore, directly summing these losses is inappropriate since the information from different outcomes is measured differently. In summary, while these modeling frameworks, including neural networks, provide valuable solutions, their applicability becomes limited when handling mixed outcomes and survival data due to the challenges in loss formulation and optimization.
To overcome these challenges, we propose a rank-based approach that unifies outcome measurements and address gradient imbalance. Rank-based losses inherently focus on ordering rather than magnitude, making them particularly well-suited for integrating mixed outcomes. In studies on rank-based methods, Han [28] offers a notable example of maximum rank correlation regression (MRC), which utilizes variable rankings as its foundation. The identifiability of coefficients is examined, and a monotonicity assumption is imposed on the models. Building on this, Fan et al. [29] establish rigorous theoretical results for these estimators. He et al. [30] propose a rank-based model averaging approach for high-dimensional covariates, preserving the aforementioned assumptions.
In this paper, we extend the rank-based idea to MTL neural networks, incorporating variable selection mechanism. Overall, this paper contributes to the following areas. First, we propose a unified framework for sparse neural networks to jointly leverage information among mixed outcomes, including survival time. Our approach effectively captures unknown non-linear relationships between response variables and high-dimensional covariates without additional assumptions. The proposed sparse neural network can predict the order of response variables based on covariates. As such, our rank-based approach requires no assumptions of identifiability or monotonicity. Second, we derive a novel loss function, comprising a uniform measurement of losses for different outcomes and a penalization for variable selection. Our loss function deviates from the traditional linear weighted summation form. Specifically, we measure losses by the rank correlations between each response variable and its prediction. Based on the proposed loss function, survival data can be naturally incorporated, and tasks are automatically balanced. Third, we develop a sparse layer in neural networks for variable selection. We penalize the coefficients in the sparse layer so that only important variables are involved in computation. A core message of this paper is that the use of rank plays a crucial role in the joint modeling of mixed outcomes with survival data, accommodating unknown non-linear relationships and high-dimensional covariates. Numerical simulations and real data analysis on SKCM demonstrate the superior performance of our method compared to alternatives.
The remainder of the paper is structured as follows. In Section 2, we formulate a model for binary, continuous, and survival response variables and propose the loss function. The estimation and computation procedures are also described here. Section 3 presents the simulation results, including comparisons with various alternatives. In Section 4, we analyze the SKCM dataset, compare the proposed method with existing alternatives, and discuss the related findings.
In Section 5, we conclude the paper with a discussion.
Comments (0)