Authors

Wei Zhu, Jiebo Luo

Abstract

Hospitals and research institutions may not be willing to share their collected medical data due to privacy concerns, transmission cost, and the intrinsic value of the data. Federated medical image analysis is thus explored to obtain a global model without access to the images distributed on isolated clients. However, in real-world applications, the local data from each client are likely non-i.i.d distributed because of the variations in geographic factors, patient demographics, data collection process, and so on. Such heterogeneity in data poses severe challenges to the performance of federated learning. In this paper, we introduce federated medical image analysis with virtual sample synthesis (FedVSS). Our method can improve the generalization ability by adversarially synthesizing virtual training samples with the local models and also learn to align the local models by synthesizing high-confidence samples with regard to the global model. All synthesized data will be further utilized in local model updating. We conduct comprehensive experiments on five medical image datasets retrieved from MedMNIST and Camelyon17, and the experimental results validate the effectiveness of our method.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16437-8_70

SharedIt: https://rdcu.be/cVRuV

Link to the code repository

N/A

Link to the dataset(s)

https://medmnist.com/

Reviews

Review #1

Please describe the contribution of the paper

This paper aims to solve a realistic problem that the local data from different clients are likely non-i.i.d distributed. To deal with this problem, authors propose a method named FedVSS, which uses Virtual Adversarial Training (VAT) for data generation/synthesis to align the local models with the global model.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The main strengths are summarized as follows:
1. Easy to follow.
2. VAT introduced to federated learning for data synthesis is interesting.
3. Comprehensive comparison experiments.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
The main weaknesses are summarized as follows:
1. Using such MNIST-like datasets for evaluation is not quite convincing.
2. Relatively limited performance improvements.
3. Experiment details are missing especially on simulating domain shift (i.e. non-i.i.d.).
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The code and dataset are accessible. The VAT approach may slightly affect the training process.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
Major comments are as follows:
1. More details on experiment settings should be provided, especially on non.i.i.d. distributions.
2. More realistic datasets shall be used for evaluation. MNIST-like datasets are quite different from clinical data, which makes the experimental results less convincing.
3. It will be interesting to show the training curves when using VAT. Because data synthesis via VAT between local and global models can affect the convergence.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

VAT is firstly introduced to federated learning, though the evaluation section may not be sufficient.
Number of papers in your stack

7
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

In this paper, the authors propose to utilize virtual adversarial sample synthesis to improve the performance of federated medical image analysis on heterogeneous data. On one hand, the local synthesized samples can smooth the local model, on the other hand, the global synthesized sample can help align local models. The combination of the two types of samples reduces the negative effect of heterogenous data in local sites.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

(1) The paper is well written and easy to follow. (2) The idea is novel. It is simple but effective. The synthesis of local and global samples can be done locally thus does not introduce extra communication cost. (3) The evaluation is good to prove the effectiveness of the method.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

There is a concern in the initial training stage. The global virtual sample synthesis assumes that the global model has roughly learned the gloabl distribution, which is not true in the begining of training. Could it be better to add the global loss in eqn. (2) after a certain iterations of training?
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors provide a lot of details such as training details, code, etc. I think it is easy to repoduce the results reported in the paper.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

Besides the concern in the weaknesse, it is interesting to see how this method works on imbanlanced data.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

7
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

It is a good paper with clear presentation, novel idea and sound experimental results. The concern mentioned in weaknesses is minor.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

This paper proposes the virtual samples synthesis to help tackle the data heterogeneity issue. The proposed method is validated on five public datasets with significant performance improvements and additional study is performed.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. This paper aims to tackle the important problem of data heterogeneity in federated learning.
2. The idea of using the local model and the global model to synthesize virtual training samples is novel and interesting.
3. The motivation for improving the client-side training by using the virtual samples is well demonstrated.
4. The paper is well organized and the writing logic is clear.
5. The method is validated on multiple datasets and achieves large performance improvement on the Camelyon dataset.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The clarity of the sample synthesis process can be improved. It is not clear how the direction r is obtained.
2. The choice comparing 10 and 20 communication rounds are confusing, it seems the model has not converged.
3. No visualization of virtual samples to demonstrate the results by using the VSS.
4. Besides ACC and F1, better use more evaluation metrics (e.g., sensitivity, AUC) to make the comparison strong.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

All questions in the reproducibility checklist are positive and implementation details are given in the paper.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
1. Fig. 1 can be improved, the middle distribution part does not deliver intuitive and direction information.
2. It is not clear how to calculate the direction r_l and r_g, better give a clear formulation.
3. Better perform repeated experiments with mean and std.
4. Better add the visualization of virtual samples or draw the distributions with and without virtual samples to demonstrate the effectiveness of using the VSS.
5. Better add the training curve, given current results, the model seems has not converged.
6. Besides performance comparison, current additional studies are weak.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper proposes an interesting and novel idea of synthesizing virtual samples to help tackle the non-iid issue in FL. The proposed method is validated on six public datasets with significant performance improvements. However, I still have some concerns regarding the calculation or the term r, the model convergence, and the insufficient additional analytical studies.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This work proposes FedVSS, a federated learning strategy for non-iid data learning. The key technical contribution is using Virtual Adversarial Training to generate synthetic samples using local and global models to handle the data heterogeneity issue. All the reviewers agreed that the proposed method was novel and the paper was well-written. However, there are server concerns on 1) using the MNIST-like datasets, 2) assuming the global model has roughly learned the global distribution at the initial stage, and 3) clarity of describing the experiment. Therefore, I suggest the authors incorporate reviewers’ feedback and comments in their final version.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

1

Author Feedback

We thank all reviewers and the meta-reviewer for their positive and constructive comments. Our responses to major concerns are provided as follows, and other comments will be directly addressed in the camera-ready version.

More details of experiments and methods. We will provide more details on our methods, experimental settings, and results in the supplementary material, including detailed steps on obtaining r, more details for generating heterogeneous federated datasets, the number of training iterations for each dataset, and the training curves for our methods, etc.

Warm-up communication rounds before applying FedVSS. We thank the suggestions from R2. We do not set the warm-up communication rounds in our experiments since we believe the proposed FedVSS could make the local models more consistent and thus benefits the performance from the beginning.

The choice of communication round. We set a sufficiently large number of training steps for each dataset to make the models fully converged with either 10 or 20 communication rounds. We will provide the training curves and more details in the supplementary material.

back to top

Federated Medical Image Analysis with Virtual Sample Synthesis