Federated learning (FL) enables participants to collaboratively train a global model on their collective data without breaching privacy (Li et al., 2020, Foley et al., 2022, Pati et al., 2022a). The decentralized mechanism makes it particularly suitable for privacy-sensitive applications such as medical image analysis (Sheller et al., 2020, Kaissis et al., 2020, Adnan et al., 2022, Yan et al., 2020, Pati et al., 2021, Pati et al., 2022a, Pati et al., 2022b). However, most existing FL methods for medical image analysis only considered intramodal heterogeneity, limiting their applicability to multimodal imaging in practice.
One such application is brain tumor segmentation in multi- parametric magnetic resonance imaging (MRI) (Iv et al., 2018). Specifically, four MRI modalities (in this work, we refer to MRI sequences as modalities following literature (Dorent et al., 2019, Menze et al., 2014, Ding et al., 2021)) are commonly used to provide complementary information and support sub-region analysis: T1-weighted (T1), contrast-enhanced T1-weighted (T1c), T2-weighted (T2), and T2 fluid attenuation inversion recovery (FLAIR). The first two modalities highlight the tumor core, and the last two highlight peritumoral edema (Fig. 1(a)). When applying FL to such multimodal applications in practice, it is not uncommon that some participant institutes only possess a subset of the full modalities due to different protocols practiced, presenting a new challenge with the intermodal heterogeneity across the FL participants. In such a scenario, there can be two objectives for FL: (1) collectively training an optimal global model for full-modal input, and (2) obtaining a personalized model for each participant (Chen and Zhang, 2022, Wang et al., 2019), adapted for its data characteristics, and more importantly, better than trained locally without FL. To our knowledge, these two objectives were rarely considered together in FL for medical image analysis.
This paper proposes a new FL framework for brain tumor segmentation with federated modality-specific encoders and partially personalized multimodal fusion decoders (FedMEPD; Fig. 1(b)). Above all, to handle distinctively heterogeneous MRI modalities, FedMEPD employs an exclusive encoder for each modality, allowing a great extent of parameter specialization. In the meantime, while the encoders are fully shared between the server and clients, the decoders are partially shared and partially personalized dynamically, simultaneously catering to individual participants and common knowledge sharing in FL. Specifically, a multimodal fusion decoder on the server (i.e., a participant with full-modal data) fuses representations from the encoders to bridge the distribution gaps between modalities and reversely optimizes the encoders via backpropagation. Each client’s fusion decoder is partially federated (and partially personalized) at the filter level based on the consistency between the parameter updates of the global and local models. Intuitively, only those parameters whose updates consistently agree between the server and the client are federated. Meanwhile, multiple anchors are extracted from the fused multimodal representations at the server and distributed to the clients along with the shared parameters. On the other end, the clients with incomplete modalities (including the special case of full-modal) calibrate their local missing-modal representations toward the global full-modal anchors via the scaled dot-product attention mechanism (Vaswani et al., 2017) to make up the information loss due to absent modalities and adapt representations of present ones. To this end, we simultaneously obtain an optimal server model (for full-modal input) and personalized client models (for specific missing-modal input) from FL without sharing privacy-sensitive information.
In summary, our contributions are as follows:
•We bring forward the intermodal heterogeneity problem due to missing modalities in FL for medical image analysis, and aim to obtain an optimal full-modal server model and personalized missing-modal client models simultaneously with a novel framework coined FedMEPD.
•To tackle the intermodal heterogeneity, we propose to employ a federated encoder exclusive for each modality followed by a multimodal fusion decoder.
•To simultaneously promote common knowledge sharing in FL and facilitate effective personalization, we propose to partially federate the fusion decoders based on the consistency between global and local parameter updates.
•In addition, we propose to extract and distribute multimodal representations from the server to the clients for local calibration of modality-specific features.
•Last but not least, we further enhance the calibration with multi-anchor representations.
Experimental results on the public BraTS 2018 and 2020 benchmarks (Menze et al., 2014, Bakas et al., 2018) show that our method achieves superior performance for both the server and client models to existing FL methods and that its novel designs are effective.
This work substantially expands our preliminary exploration (Dai et al., 2024) in three main aspects. First, we change the completely personalized multimodal fusion decoder to partially personalized and partially federated. Such a change aims to promote the sharing of common knowledge in FL while still facilitating personalization, leading to notable improvements in clients’ performance (cf. Table 6). Second, in Dai et al. (2024), each client had data of only one modality, whereas clients in this work may have one to four (full) modalities for a more common and practical problem setting. Third, we employ one more public benchmark dataset to demonstrate the effectiveness and generalization of the improved framework.
Comments (0)