Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Zhifang Deng, Dandan Li, Shi Tan, Ying Fu, Xueguang Yuan, Xiaohong Huang, Yong Zhang, Guangwei Zhou

Abstract

With the increasingly strengthened data privacy acts and the difficult data centralization, Federated Learning (FL) has become an effective solution to collaboratively train the model while preserving each client’s privacy. FedAvg is a standard aggregation algorithm that makes the proportion of the dataset size of each client an aggregation weight. However, it can’t deal with non-independent and identically distributed (non-IID) data well because of its fixed aggregation weights and the neglect of data distribution. The paper presents a new aggregation strategy called FedGrav, which is designed to handle non-IID datasets and is inspired by the law of universal gravitation in physics. FedGrav can dynamically adjust the aggregation weights based on the training condition of local models throughout the entire training process, making it an effective solution for non-IID data. The model affinity is creatively proposed by considering both the differences of sample size on the client and the discrepancies among local models. It considers the client sample size as the mass of the local model and defines the model graph distance based on neural network topology. By calculating the affinity among local models, FedGrav can explore internal correlations of them and improve the aggregation weights. The proposed FedGrav has been applied to the CIFAR-10 and the MICCAI Federated Tumor Segmentation (FeTS) Challenge 2021 datasets, and the validation results show that our method outperforms the previous state-of-the-art by 1.54 mean DSC and 2.89 mean HD95. The source code will be available on Github.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43895-0_16

SharedIt: https://rdcu.be/dnwxY

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

This paper proposes FedGrav, an adaptive federated aggregation algorithm for multi-institutional medical image segmentation. The algorithm is designed to handle non-IID datasets and dynamically adjusts the aggregation weights based on the training condition of local models throughout the entire training process. The model affinity is proposed by considering both the differences of sample size on the client and the discrepancies among local models. The proposed FedGrav is evaluated on the CIFAR-10 and the MICCAI Federated Tumor Segmentation (FeTS) Challenge 2021 datasets, and the validation results show that it outperforms the previous state-of-the-art.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

1, This paper is well-motivated and the idea of using the model graph distance to quantify model differecnes is novel. 2, The presentation of the proposed methods is mostly clear and easy to follow. 3, Extensive experiments show the good performance of the proposed methods.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

1, The idea of using model distance to guide the federated learning process has been proposed in previous works [1]. 2, In the evaluation, the proposed methods are not fully compared with SOTA methods.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The provided experimental details are sufficient to reproduce the results.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

1, What are the advantages of the proposed methods compared with existing work(s) [1]. 2, In the evaluation, when comparing with SOTA methods in Table 1, the performance of FedGrav is slightly higher than the SOTA methods. To better evaluate the proposed methods, the standard deviation of the results could be reported. Besides, how are the hyper-parameters of the baseline methods tuned? Are they tuned to the same extent as the proposed methods? The reported performance of FedProx is lower than FedAvg, which is not as expected. 3, When evaluating on the MICCAI FeTS2021 Training dataset, why are some of the SOTA methods present in Table 1 missing in Table 2?

[1] Li, Qinbin, Bingsheng He, and Dawn Song. “Model-contrastive federated learning.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The overall quality of this paper is good. Some concerns about the novelty and the performance need to be addressed
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

This paper proposes a adaptive aggregation algorithm for federated learning under Non-IID scenario. Experimental results show some improvements.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The proposed affinity based method is new to FL scenario.
2. SOme improvements on FeTS2021 dataset compared with FedAvg and some other SOTA methods.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The experimental part is not sufficient, in which CIFAR10 is not a medical dataset.
2. There are lots of SOTA methods solving Fed Non-IID scenario not compared with, such as FedBN[1], FedProx[2], and so on.
3. The performance improvement on FeTS2021 is marginal.
[1] Li, Xiaoxiao, Meirui Jiang, Xiaofei Zhang, Michael Kamp, and Qi Dou. “Fedbn: Federated learning on non-iid features via local batch normalization.” arXiv preprint arXiv:2102.07623 (2021). [2] Li, Tian, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, and Virginia Smith. “Federated optimization in heterogeneous networks.” Proceedings of Machine learning and systems 2 (2020): 429-450.
Please rate the clarity and organization of this paper

Poor
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The major experimental details are given.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

Please refer to the strength and weakness.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

3
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The experimental results are not sufficient and writing is a little poor.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

4
[Post rebuttal] Please justify your decision

I am still concerned about the writing and organization of this paper with respect to the text description and experimenal settings. But I would like to raise my score to 4: weak reject.

Review #3

Please describe the contribution of the paper

This paper presents a new aggregation strategy, called FedGrav, to solve the server aggregation problem of federated learning for non-IID data. The proposed method mainly considers both the differences in sample size on the client and the discrepancies among local models. The latter is obtained by calculating the affinity among local models. The experiments show that the proposed method outperforms the state-of-the-art approaches.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. An interesting way to introduce the proposed idea with the law of universal gravitation in physics.
2. Well written and good to follow
3. The proposed way to measure the distance between client models based on neural network topology seems reasonable.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. Many related works are ignored.
2. The experiment part needs to be enhanced, and the selection comparison methods are not reasonable.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Positive: Public datasets used Authors claimed that the source code would be available on Github A clear and detailed description of the algorithm.

Negative:

Not all hyperparameters are given. Some important details are missing, such as data preprocessing and augmentation, the update way of learning rate, and so on How were baselines tuned?
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

In the introduction part: (1) What is “formula 1”? (2) Many related works are ignored, and a clear comparison between these methods and the proposed method should be added to highlight the contributions of this work. FedSim: Similarity Guided Model Aggregation for Federated Learning Neurocomputing 2022 FedBE: Making Bayesian Model Ensemble Applicable to Federated Learning ICLR 2021 Personalized Retrogress-Resilient Framework for Real-World Medical Federated Learning MICCAI2021 Federated Contrastive Learning for Decentralized Unlabeled Medical Images, MICCAI 2021

In the method part: (4) For Eq. (2), is it also applicable for BN layers and FC layers?

In the experiment part: (5) For CIFAR-10 dataset, the selection of comparison methods is not reasonable. Authors should focus on FL methods solving the server aggregation problem. (6) For FeTS2021 dataset, more methods are also expected to be added.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

(1) The idea is novel. (2) Experiments need to be enhanced.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

6
[Post rebuttal] Please justify your decision

Authors has partially addressed my concerns. I still hope to add some comparative methods to strengthen the experiment.

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper receives mixed comments. The reviewers acknowledge the paper is well-written, well-motivated, and novel. However, reviewers also raise concerns, including fair comparison, marginal improvement, missing related work, and others. The authors are invited to address the weakness and questions and problems. Additionally, repeating the experiment and reporting std is helpful to investigate the robustness of FedGrav wrt training.

Author Feedback

We thank all the reviewers and the meta-reviewer for their kind reviews and constructive comments. We address the main concerns as follows: @MR@R1@R2@R3 Fair comparison: There are two reasons for reporting on the performance of these SOTA methods on different datasets. Firstly, one of the purposes of the FeTS2021 Challenge is to compare the performance of aggregation algorithms within the official framework. And the FedCostWAvg is the champion method on it according to [21] in the manuscript. So we compared these SOTA methods directly, which demonstrate the effectiveness of FedGrav. In experiments, we keep all the hyper-parameters the same for all methods. In comparison, we find the performance of FedProx is lower, we repeat the experiment and get a similar result, which is consistent with the work[1]. [1] Li Q, Diao Y et al. Federated learning on non-iid data silos: An experimental study[C]//2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 2022: 965-978. @MR@R2 Marginal improvement: Table 2 reports the performance of FedGrav in the FeTS2021 dataset, and the first six items reported the performance of individual indicators, and the last two items reported the mean DSC and mean HD95. FedGrav outperforms the previous SOTA FedCostWAvg 1.54 mean DSC and 2.89 mean HD95, and surpasses FedAvg by 2.11 mean DSC and 8.23 mean HD95. We think the performance improvement is significant rather than marginal. Furthermore, our primary emphasis lies in innovating methods for non-IID scenarios, aiming to not only enhance performance but also stimulate researchers to adopt a fresh perspective in their thinking. @MR@R3 Missing related work: The related work cited in paper is closely related to the proposed method and can serve as a support for the completeness of this article. The related work provided by the R3 can enhance the article and make it more complete. They proposed corresponding aggregation methods from the perspectives of clustering, frequency domain, Bayesian, and representation similarity analysis. Different from the above methods, FedGrav achieves the integration of sample size disparities and local model variations among clients through the innovative concept of model affinity, which serves as a powerful guide for effective model aggregation. We will modify the manuscript in a camera-ready version. @MR@R1 The robustness of FedGrav: We repeated the experiments and reported std to evaluate the robustness. For CIFAR-10 dataset, FedAvg(88.37±0.04) FedProx(87.93±0.19) FedNova(88.68±0.26) FedGrav(89.35±0.23). For FeTS2021 dataset, FedAvg(77.63±0.573) FedCostWAvg(78.20±0.749) FedGrav(79.74±0.595). The results illustrate the effectiveness and robustness of FedGrav. @R2 Experimental questions: FedGrav was validated on two public datasets, CIFAR-10 and FeTS2021 training data, which are a manually partitioned non-IID dataset and a big real-world dataset, respectively. We conducted detailed experiments and visualization on these datasets besides the ablation study. Although CIFAR-10 is not a medical dataset, it serves as a valuable benchmark for verifying the effect of methods. Numerous SOTA methods have demonstrated the performance on it, such as FedAvg, FedNova, Auto-FedAvg. The CIFAR-10 dataset can quickly verify methods and lay a foundation for subsequent evaluations. We compared FedGrav with other SOTA methods solving Fed non-IID scenarios. Table 1 has reported the results, including FedProx, FedNova and Auto-FedAvg. @R3 Questions about the manuscript: We are sorry about that. The formula 1 should be called Eq.1, and it is in Section 2.2. We have checked the manuscript and will correct it in the later version. For Eq.(2), it is applicable to convolution layers. The weight matrices of FC layers are 2-d matrices that can be directly converted into a topology graph. For BN layers, we applied the difference of weight matrix between each local model and global model in last round because it is a 1-d matrix.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors have addressed the majority of concerns. I recommend acceptance for this work. One reviewer remains uncertain regarding the descriptions of the experiments. Please try to improve the clarity and including standard deviations in the results into the camera-ready version.

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The paper possesses several notable strengths. Firstly, it introduces a well-motivated and novel approach of utilizing model graph distance to quantify model differences. Additionally, the presentation of the proposed methods is generally clear and comprehensible, facilitating easy understanding. Furthermore, extensive experiments showcase the strong performance of the proposed methods, thereby confirming their efficacy. Reviewers also highlight the contribution of an affinity-based method specifically designed for the Federated Learning scenario, as well as the improvements observed on the FeTS2021 dataset compared to existing methods. The rebuttal addressed a couple of concerns and I’m personally feeling positive about the paper and would recommend for acceptance.

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The experimental results only have marginal improvement. The paper may use more medical datasets in validations.

back to top

FedGrav: An Adaptive Federated Aggregation Algorithm for Multi-institutional Medical Image Segmentation