Authors

Myeongkyun Kang, Philip Chikontwe, Soopil Kim, Kyong Hwan Jin, Ehsan Adeli, Kilian M. Pohl, Sang Hyun Park

Abstract

One-shot federated learning (FL) has emerged as a promising solution in scenarios where multiple communication rounds are not practical. Notably, as feature distributions in medical data are less discriminative than those of natural images, robust global model training with FL is non-trivial and can lead to overfitting. To address this issue, we propose a novel one-shot FL framework leveraging Image Synthesis and Client model Adaptation (FedISCA) with knowledge distillation (KD). To prevent overfitting, we generate diverse synthetic images ranging from random noise to realistic images. This approach (i) alleviates data privacy concerns and (ii) facilitates robust global model training using KD with decentralized client models. To mitigate domain disparity in the early stages of synthesis, we design noise-adapted client models where batch normalization statistics on random noise (synthetic images) are updated to enhance KD. Lastly, the global model is trained with both the original and noise-adapted client models via KD and synthetic images. This process is repeated till global model convergence. Extensive evaluation of this design on five small- and three large-scale medical image classification datasets reveals superior accuracy over prior methods. Code is available at https://github.com/myeongkyunkang/FedISCA.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43895-0_49

SharedIt: https://rdcu.be/dnwy2

Link to the code repository

https://github.com/myeongkyunkang/FedISCA

Link to the dataset(s)

N/A

Reviews

Review #2

Please describe the contribution of the paper

The paper proposes FedISCA, a novel one-shot federated learning framework that leverages image synthesis and client model adaptation with knowledge distillation to facilitate robust global model training with decentralized client models. The framework generates diverse synthetic images to prevent overfitting and alleviate data privacy concerns. The proposed design outperforms prior methods in extensive evaluations on small- and large-scale medical image classification datasets.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Innovative method for One-shot Federated Learning.
2. Extensive evaluation on multiple datasets with different settings.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. the writing can be improved, especially for the method section.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Most implementation details are presented.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
1.Could you please provide additional details on how the local models are ensembled? The current description lacks clarity and requires further explanation.
1. Can you specify the total number of synthetic images generated and explain the meaning of “100 epochs” in the context of image synthesis?
2. Following the original implementation and matching all training/parameter settings may not lead to a fair comparison. To ensure fairness, it would be more appropriate to tune the parameters of the baseline methods on the dataset you are using to achieve optimal performance.
3. It would be beneficial to conduct tests on more heterogeneous cases. Using a Dirichlet distribution with alpha values of 0.3 and 0.6 may not be distinct enough to fully assess the strengths and limitations of the model.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper proposes a novel method for one-shot federated learning with KD. Extensive experiments are conducted to show its effectiveness. However the writings of the paper can be improved.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #4

Please describe the contribution of the paper

The paper adapts DeepInversion for data-free knowledge distillation in one-shot federated learning for medical data by incorporating an adaptive batch normalization scheme. They find noise adapted clients outperform competing baselines on several medical imaging datasets.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Performance seems good compared to other data-free KD approaches and experimentation is good. Figure 1 does a good job at illustrating differences between natural and medical images.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

This methological novelty seems to be incorporating adaptive batch normalization into an existing KD framework based on DeepInversion. But the ablation results show that adaptive BN improvements are minor on most datasets, especially on apparently more difficult classification tasks such as TissueMNIST and Diabetic Retinopathy. I wonder if the added complexity is justified.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Public datasets were used and data should be released. Should report confidence intervals to see if results are statistically signficant.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
- Please clarify how your approach differs from existing data-free KD techinques
- Why were non-iid experiments not conducted on the large scale datasets?
- Why does ablating IS and ADA not affect performance on DermaMNIST in non-IID settings?
- Why is FedISCA without Ada still outperforming other baselines?
- How does your method perform on natural images e.g. cifar10, ImageNet compared to other methods?
- The visual comparison of the synthesized images are interesting and more examples should be included in the appendix
- How does the amount of steps affect the resulting images? Is a linear noise schedule the best choice?
- The other baseline methods seems to improve in performance under model heterogeneity. Do you have any intuition why?
- Why does higher variance in BN statistics correspond to higher accuracy? Can you provide deeper analysis?
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The technical novelty of the paper is limited (mainly replacing batch norm with adaptive batch norm) but the experimentation is decent (both MedMNIST and larger scale medical datasets).
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #1

Please describe the contribution of the paper

The paper proposes an one-shot federated learning scheme for medical image classification, which consists of three main steps: (i) DeepInversion for image synthesis, (ii) noise-based client model adaptation using AdaBN, and (iii) ensemble knowledge distillation using both original and noise-adapted client models.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

1) Though none of the three main steps in the proposed algorithm are new, the novelty lies in the clever combination of the known methods to achieve better performance. In particular, storing all the intermediate images generated during DeepInversion and using them in the knowledge distillation step makes the overall algorithm more efficient.

2) The proposed approach has been evaluated on a variety of medical image datasets and high accuracy results have been reported.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

1) The results clearly show that image synthesis is the most critical step, without which the whole algorithm collapses. Since the key novelty is the use of intermediate images synthesized during DeepInversion, the following questions need to be answered:

(a) DeepInversion is included as a comparison method, but it is not clear how many images were synthesized per class for this method? Is is true that if DeepInversion produces N “best” images for use in knowledge distillation, 500 (or 1000) x N images are used by the proposed method for KD (since all the intermediate versions are stored)? If yes, what happens when more “best” images are generated using DeepInversion and used for KD (albeit at a higher computational cost)? Will this circumvent the overfitting issue (monotonous samples) identified in the paper?

b) Are all the intermediate versions generated during DeepInversion useful for KD? What will happen if only a subset of them (say the best x% or only the image generated after every y iterations) is used for KD?

c) Finally, what will happen if the synthesized images with different levels of noise are assigned different weights in the KD step?

2) The other key limitation is that when the client models are well-trained on larger datasets and as the images become more complex, the difference between one-shot FedAvg and the proposed method is not very substantial (see Table 2). This raises questions about the efficacy of the proposed method in real-world settings.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Appears to be reproducible.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

Please see the comments under weaknesses.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

There is some technical novelty in the work and the experimental results are quite convincing.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

Authors propose a method that deals with One-shot federated learning (e.g. 1 communication round) by using Knowledge distillation and Image synthesis. More specifically DeepInversion is used to create synthetic images on the server using received models. The core of the idea is that KD is done not only using the final inverted images but also all the intermediate generating steps (even the initial noise). To this end the use Adaptive BN to adapt BN statics to noise.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Method is novel: there are similarities with other methods but authors clearly define the main differences when comparing to them. Furthermore, even if Deep Inversion and AdaBN are already existing, the core idea to train on intermediate step is interesting.
- Well written and easy to follow
- Experiments are well executed and follow standard FL approaches
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- no results with different numbers of clients (5 clients is not enough to validate an FL strategy and could be fair to other methods that may perform better when scaling up)
- Inconsistency in results. Table 1 shows a significant drop when removing Image Synthesis but this does not happen in Derma. I suggest diving in to check if something was wrong or otherwise provide a plausible explanation
- Results exceed by far sota competing methods but this performance gap is not fully motivated. In particular, DI, the method used for Image Synthesis in the paper, performs poorly w.r.t. the proposed method. The authors’ main justification is that they don’t use just images but also all the other intermediate step. I agree that this intermediate noise can act as a regularizer but this does not fully justificate the performance gap.
- Competing methods are described as not being able to deal with medical data and results justify this claim. Given the fact that the proposed method can perform better in a more difficult task (medical), a reader would assume that the proposed method is able to perform better in the other setting. It would be great to add also this comparison taking, for example, the same setting of [32]
minor:
- please provide the number of classes in the Dataset description in order to give the reader a better insight into the task without the need to manually check for this information
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Code will be released as their splits and evaluations
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
Dear authors, I really appreciated your work and the core idea of the paper. Despite this there are some flaws that I want to investigate before confirming my evaluation, I report them here for easier reference:
- no results with different numbers of clients (5 clients is not enough to validate an FL strategy and could be fair to other methods that may perform better when scaling up)
- Inconsistency in results. Table 1 shows a significant drop when removing Image Synthesis but this does not happen in Derma. I suggest diving in to check if something was wrong or otherwise provide a plausible explanation
- Results exceed by far sota competing methods but this performance gap is not fully motivated. In particular, DI, the method used for Image Synthesis in the paper, performs poorly w.r.t. the proposed method. The authors’ main justification is that they don’t use just images but also all the other intermediate step. I agree that this intermediate noise can act as a regularizer but this does not fully justificate the performance gap.
- Competing methods are described as not being able to deal with medical data and results justify this claim. Given the fact that the proposed method can perform better in a more difficult task (medical), a reader would assume that the proposed method is able to perform better in the other setting. It would be great to add also this comparison taking, for example, the same setting of [32]
minor:
- please provide the number of classes in the Dataset description in order to give the reader a better insight into the task without the need to manually check for this information
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Despite the flaws and some doubts about results, I’ve found the method novel, and the experiments correctly performed. Furthermore the paper is well-written and most of the details are clearly explained
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper uses well-known methods to improve performance, and there are several good points to note. The authors have tested their proposed method thoroughly on a variety of medical image datasets, with the results showing high accuracy. The paper is well-written and easy to understand, which helps in understanding the proposed method and the experiments conducted.

The authors are suggested to clarify the differences of their proposed methods over other data-KD methods and one-shot FL methods. I hope the authors can carefully address the questions in the final version. Also, it will be helpful to justify the computation overhead in the design.

Author Feedback

N/A

back to top

One-shot Federated Learning on Medical Data using Knowledge Distillation with Image Synthesis and Client Model Adaptation