Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Pramit Saha, Divyanshu Mishra, J. Alison Noble

Abstract

The most challenging, yet practical, setting of semi-supervised federated learning (SSFL) is where a few clients have fully labeled data whereas the other clients have fully unlabeled data. This is particularly common in healthcare settings where collaborating partners (typically hospitals) may have images but not annotations. The bottleneck in this setting is the joint training of labeled and unlabeled clients as the objective function for each client varies based on the availability of labels. This paper investigates an alternative way for effective training with labeled and unlabeled clients in a federated setting. We propose a novel learning scheme specifically designed for SSFL which we call Isolated Federated Learning (IsoFed) that circumvents the problem by avoiding simple averaging of supervised and semi-supervised models together. In particular, our training approach consists of two parts - (a) isolated aggregation of labeled and unlabeled client models, and (b) local self-supervised pretraining of isolated global models in all clients. We evaluate our model performance on medical image datasets of four different modalities publicly available within the biomedical image classification benchmark MedMNIST. We further vary the proportion of labeled clients and the degree of heterogeneity to demonstrate the effectiveness of the proposed method under varied experimental settings.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43895-0_39

SharedIt: https://rdcu.be/dnwyS

Link to the code repository

https://github.com/PramitSaha/IsoFed-MICCAI-2023

Link to the dataset(s)

https://github.com/MedMNIST/MedMNIST


Reviews

Review #2

  • Please describe the contribution of the paper

    The paper proposes a new learning scheme called IsoFed for semi-supervised federated learning, designed explicitly for settings where some clients have labeled data while others have unlabeled data. IsoFed avoids the problem of jointly learning from both labeled and unlabelled clients by isolating the aggregation of labeled and unlabeled client models. Furthermore, they introduce a local self-supervised pretraining showing promising performance gains. The proposed method is evaluated on medical image datasets and shown to be effective under varied experimental settings.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The setting of semi-supervised federated learning with a subset of clients without labeled data is realistic.
    2. The proposed solution is practical for real-world deployment.
    3. The method aims to avoid the gradient divergence problem common in other semi-supervised FL methods when aggregating updates from labeled and unlabeled clients simultaneously due to the different employed objective functions.
    4. The proposed alternating aggregation method is simple yet effective.
    5. The continuous pretraining solution is interesting and results in promising performance gains.
    6. The method has been evaluated on four medical imaging tasks using simulated non-IID label distributions and compared to baseline semi-supervised FL methods.
    7. The paper is well-written, and the results are comprehensive and promising.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. My main criticism is that the method was only evaluated on 2D toy imaging datasets (28x28 pixel resolution). The simulated non-IID distributions only reflect label heterogeneity but do not include realistic domain shifts common in real-world FL scenarios where each client’s data likely originates from different sources, including diverse patient populations and various imaging devices and protocols.
    2. Furthermore, the tested model is the same classification model for each dataset. Therefore it is unclear how the method would generalize to more complex tasks such as segmentation with client data from different sources.
    3. There are limited novelties due to the simplicity of the transfer learning method (see the relation to “cyclic weight transfer” below). For example, the semi-supervised learning method is standard and well-explored in the literature. Also, the continuous pertaining idea is taken from NLP literature.
    4. Some training details such as the number of FL rounds in the proposed method and baselines are missing.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Very good. The authors provided code and promised to make publicly available upon acceptence.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    There seems to be a link to continuous transfer learning approaches that have been previously proposed for distributed learning settings, such as “cyclic weight transfer”. The relation could be discussed and the work cited.

    Chang, Ken, et al. “Distributed deep learning networks among institutions for medical imaging.” Journal of the American Medical Informatics Association 25.8 (2018): 945-954.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, the method is reasonable, and results on toy datasets seem promising. The experiments are comprehensive, and baseline comparisons seem valid. While I would have hoped for an evaluation on more real-world FL datasets, the simplicity of the approach might make it easy to deploy the method in real-world FL applications and could therefore be impactful for the community.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper investigates a certain setting of Semi-Supervised Federated Learning (SSFL), where clients have either fully-labeled data or fully-unlabeled data. It proposes Isolated Federated Learning (IsoFed) that separately aggregates two types of clients.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper is well-organized.
    2. The proposed techniques are reasonable and well-motivated.
    3. The scenario considered is realistic.
    4. Comprehensive experimental results are provided.
    5. Code is also provided.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Additional clarifications are needed for empirical results.

    1. Could the authors provide the results of vanilla FedAvg and MT+FedAvg as well?
    2. Clarifications are needed to explain why IsoFed fails to exceed the compared baselines for some situations. For example, PneumoniaMNIST with gamma=0.5.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Code is provided.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    See weaknesses.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Good presentation, reasonable techniques, and sufficient empirical results

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #4

  • Please describe the contribution of the paper

    This paper studies the problem of semi-supervised learning where a few clients have fully labeled data and other clients have fully unlabeled data. The authors propose isolated aggregation of labeled and unlabeled client models and local self-supervised pretraining. The proposed method is evaluated on four datasets from the MedMNIST and outperforms compared methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • This paper studies an important and practical problem for SSFL, where some clients are fully labeled and some clients are not.
    • The discussion on problems of the classic federated averaging scheme is clear.
    • The illustration of problem setting is clear and easy to understand.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The motivation for studying the realistic SSFL scenario is weak, as there have been some works that assume some clients are fully labeled and some clients are fully unlabeled.
    • The insight and motivation of the proposed method are unclear. How can the proposed method help address the proposed question?
    • The isolated aggregation part is confusing. The authors want to aggregate the model parameters of all unlabeled clients, but in Eq.2 and Eq.3, the client index is from 1 to K, which are all clients.
    • The way of local training and client pretraining are some typical techniques, the technical contribution of this part is limited.
    • The paper’s organization can be improved. Isolated aggregation is the major contribution, should elaborate more on this.
    • The experiment comparison is not convincing enough. There are some SFFL methods [1,2] also designed and evaluated on medical data, but not included in comparison.
    • The ablation study is weak. The method part mentions isolated aggregation, local training, and pretraining. The current study only studies the effects of removing pretraining.
    • The experiments are performed on MedMNIST, which lacks validation on real-world data.

    [1 ]Yang, D., Xu, Z., Li, W., Myronenko, A., Roth, H.R., Harmon, S., Xu, S., Turkbey, B., Turkbey, E., Wang, X., et al.: Federated semi-supervised learning for covid region segmentation in chest ct using multi-national data from china,italy, japan. Medical image analysis 70, 101992 (2021) [2] Liu, Q., Yang, H., Dou, Q., Heng, P.A.: Federated semi-supervised medical image classification via inter-client relation matching. In: Medical Image Computing and Computer Assisted Intervention. pp. 325–335. Springer (2021)

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    This paper should be reproducible as the authors will release the code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • Please elaborate more on the motivation for studying this problem, by discussing current works and their drawback. Why previous works cannot fully solve the mentioned SSFL problem?
    • Please elaborate more on the insight of the IsoFed.
    • What is the motivation for adopting mean-teacher-based semi-supervised learning to train each unlabeled client? It does not much relate to the main idea of isolated aggregation.
    • It is more convincing to compare with some SSFL methods designed and validated on medical data.
    • The ablation study needs to be improved by studying the effects of each component.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper studies an important problem of semi-supervised federated learning, specifically for the scenario where some clients are fully labeled and some are fully unlabeled. The topic is relevant and important, and the proposed method outperforms the compared method on the MedMNIST datasets. However, the organization of this paper can be further improved, the motivation of this paper needs to be enhanced, and the proposed method lacks some important details (e.g., the isolated aggregation part). The experiments are not comprehensive or thorough, some SSFL methods study the same problem but are not included for comparison, and the ablation study can be improved. Overall, I think there is still some space for this paper to be improved before publication.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #1

  • Please describe the contribution of the paper

    The authors propose a novel learning scheme specifically designed for semi-supervised federated learning (SSFL), which is called as Isolated Federated Learning (IsoFed). It consists of two part: (a) isolated aggregation of labeled and unlabeled client models, and (b) local self-supervised pretraining of isolated global models in all clients. Their results show that the proposed isolated aggregation followed by federated pretraining outperforms the SOTA method. Overall, the work is interesting.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is well-written and easy to follow.
    2. A novel learning scheme specifically designed for semi-supervised federated learning (SSFL) is proposed. The idea is interesting and experimental results show its effectiveness.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The framework needs to be further validated on a larger number of clients.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper is clear enough for others to reproduce.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    I have some questions about the model and experimental results. 1.In the real world, a more likely scenario would be: some clients only have labeled data, some clients only have unlabeled data, and some clients have both labeled and unlabeled data, which does not seem to be reflected in Figure 1. Furthermore, how should the model work in this situation? 2.In Table 1, when the number of L decreases, the performance (Acc.) of IsoFed does not show a decreasing trend in both the BloodMNIST and PathMNIST datasets. L=2 has better Acc. than L=3, why? From an intuitive perspective, shouldn’t the more labeled clients, the better the effect? 3.Can the data of labeled clients be used for Step 1: Unlabeled client aggregation in Fig. 1. It seems that using these data in Step 1 would be even better? 4.The total number of clients designed for the experiment seems too small. I think that experiments need to be conducted on a larger number of clients to further demonstrate the performance of the model.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper propose a novel learning scheme specifically designed for semi-supervised federated learning (SSFL). However, the framework needs to be further validated on a larger number of clients.

  • Reviewer confidence

    Not confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Summary: The paper presents IsoFed, a novel learning scheme for semi-supervised federated learning (SSFL) that addresses the challenge of handling labeled and unlabeled client data. IsoFed incorporates isolated aggregation of labeled and unlabeled client models along with local self-supervised pretraining. Experimental results demonstrate the superiority of IsoFed over the state-of-the-art method. The paper offers an interesting exploration of SSFL in a specific setting and showcases promising performance gains through the proposed isolated aggregation and pretraining approaches.

    Strengths:

    • The paper is exceptionally well-written and maintains excellent readability.
    • The introduced learning scheme for semi-supervised federated learning (SSFL) is innovative and exhibits effectiveness through experimental validation.
    • The method adequately addresses a realistic scenario of SSFL where clients lack labeled data, and the proposed solution demonstrates practical applicability.

    Weaknesses:

    • To ensure the generalizability of the framework, further validation on a larger number of clients is necessary.
    • The evaluation is restricted to 2D toy imaging datasets, which fail to capture realistic domain shifts and the diversity of data sources encountered in real-world federated learning scenarios.
    • The method’s capability to handle more complex tasks and client data originating from diverse sources remains unclear.
    • The novelties introduced by the method are limited due to the simplicity of the transfer learning approach, and certain training details are missing.
    • Additional clarifications are required regarding the results of vanilla FedAvg and MT+FedAvg, as well as explanations for instances where IsoFed falls short of surpassing the compared baselines.

    Constructive Feedback:

    • Further clarification is necessary to understand the model’s performance when clients possess varying combinations of labeled and unlabeled data, as this scenario is more representative of real-world situations.
    • The performance trend of IsoFed with different numbers of labeled clients (L) in Table 1 raises questions, particularly regarding the superiority of L=2 over L=3. An explanation is needed to comprehend the intuitive relationship between the number of labeled clients and performance.
    • Addressing the potential utilization of data from labeled clients in Step 1 of the aggregation process in Figure 1 is crucial, as it could potentially enhance performance.
    • Conducting experiments on a larger number of clients would enhance the validation and overall demonstration of the model’s performance. The current number of clients utilized in the experiments appears insufficient.

    • Exploring the potential relationship between the proposed approach and prior continuous transfer learning methods, such as “cyclic weight transfer,” should be discussed, and relevant literature should be cited to provide a comprehensive understanding.




Author Feedback

Thanks to all the reviewers (R1, R2, R3, R4, and Meta-R) for their constructive comments.

  1. Application to complex tasks and datasets, larger number of clients, realistic domain shifts, varying combinations of labeled and unlabeled data (M1, R1): Realistically, in most healthcare collaborations, we find a small number of participating medical institutions (typically hospitals). Hence, while our model works successfully on a larger number of clients, we have confined our experiments to four client scenarios in order to imitate such a practical federated learning setting. Our model is specifically designed to solve federated classification tasks for fully labeled and fully unlabeled clients, which involve heterogeneity in terms of data and class imbalance. In future work, we plan to extend the algorithm to address other complex tasks, including segmentation, handle domain shifts, and accommodate partially labeled clients.

  2. Clarification regarding experimental results (M1, R1, R3): Intuitively, the more labeled clients, the better the expected performance. However, this is based on the assumption that the model gives equal importance to each of the clients. This assumption is not satisfied here as we use dynamic weighting of the clients via distance-based similarity of model parameters and let the model decide the relative importance of the clients. However, we observe that the model’s reliance on unlabeled clients can sometimes outweigh its reliance on labeled ones, leading to a lack of a discernible performance trend.

  3. Connection with cyclic weight transfer (M1, R2): Cyclic weight transfer involves training a model sequentially at one client after another without the involvement of a central server. Our approach, on the other hand, involves isolated aggregation of the model weights of labeled and unlabeled client groups via central server followed by local pre-training in individual clients at the beginning of the next round.

  4. Missing training details (M1, R2): Further training details are provided in the Supp. Sec 1.3. The number of training rounds for each experiment is set to 200.

  5. Experimental comparison not convincing (R4): Our model outperforms the state-of-the-art algorithm RSCFed which has been shown to outperform the previous methods by Yang et al. [1], and Liu et al. [2] on medical imaging dataset. Hence, due to the brevity of space, we only compare our model with RSCFed in this paper.



back to top