Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Nanqing Dong, Michael Kampffmeyer, Irina Voiculescu

Abstract

Using decentralized data for federated training is one promising emerging research direction for alleviating data scarcity in the medical domain. However, in contrast to large-scale fully labeled data commonly seen in general object recognition tasks, the local medical datasets are more likely to only have images annotated for a subset of classes of interest due to high annotation costs. In this paper, we consider a practical yet under-explored problem, where underrepresented classes only have few labeled instances available and only exist in a few clients of the federated system. We show that standard federated learning approaches fail to learn robust multi-label classifiers with extreme class imbalance and address it by proposing a novel federated learning framework, FedFew. FedFew consists of three stages, where the first stage leverages federated self-supervised learning to learn class-agnostic representations. In the second stage, the decentralized partially labeled data are exploited to learn an energy-based multi-label classifier for the common classes. Finally, the underrepresented classes are detected with the learned energy and a prototype-based nearest-neighbor model is proposed for few-shot matching. We evaluate FedFew on multi-label thoracic disease classification tasks and demonstrate that it outperforms the federated baselines by a large margin.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16452-1_7

SharedIt: https://rdcu.be/cVRYK

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #2

  • Please describe the contribution of the paper

    This paper proposed ‘an under-explored problem’ federated partially supervised learning. And they also proposed a framework to solve the problem with federated self-supervised learning, energy-based loss and a prototype based inference. Ablation study showed better results than some baselines.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The application is interesting and shows the real problems in clinical setting.
    • The authors presented strong evaluation on the efficiency of their method compared to baselines.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The presentations of the method and experiment are not clear, I have difficulty to fully understand the method and the experiment.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper provides sufficient details about the models/algorithms, datasets, and evaluation.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • The algorithm 1 is too simple to represent the method, I can’t understand the method only by the code provided in algorithm 1. I suggest the author to include a flowchat/figure in the paper to better present their method.
    • In table 2, is the prototype based inference adopted to all the method, is the energy calculated for all the methods during inference? Or just FedFew? What is the difference between ‘FedFew w/o EBM’ and ‘NN (MLC w/ FSSL)’, and what is the factor that bring so much improvements between these two methods?
    • Why (Cc + 1)-dimensional vector important?
    • ‘We sample 10 negative examples and 10 positive examples to simulate the class imbalance for UCs’. What’s this?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The topic is novel, The method looks novel but I think some important but detailed information is missed.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Somewhat Confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    The paper proposes a federated learning framework for learning underrepresented classes under partial label scenarios. The framework contains a self-supervised learning for warmup, an energy-based partial label learning for differentiating common classes and underrepresented classes, and a prototype-based inference stage for final testing. The results showed good improvement compared with baselines.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The problem in discussion, i.e., federated learning with partial label and underrepresented classes, is quite interesting and underexplored indeed. From the results, the conventional baseline cannot solve the problem, while the proposed method provide a practical solution.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. SSL step seems effective for common classes but not useful for the underrepresented classes, which might be the main focus of the paper?
    2. Doesn’t passing meta data from local servers to the parameter sever violates the privacy policy?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Would be good if the authors release the code as promised.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    major points:

    1. Section 3.3: a 0-th class is used to determine whether the image contains any CCs. Then how would one know if an image contains any CCs if this image is from the UC clients?
    2. Section 3.4: Doesn’t passing meta data from local servers to the parameter sever violates the privacy policy?
    3. Table 2&3: The FSSL step seems effective for common classes but not useful for the underrepresented classes, which might be the main focus of the paper? minor points:
    4. “We first train an MLC model for CCs Cc”. It’s hard to understand what is CCs Cc.
    5. Table 1: what are RN and DN?
    6. Table 2: it’s recommended to explain A,P,R,and F in the table caption.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The study overall is interesting. Some points need more explanation to help readers better understand the paper.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #4

  • Please describe the contribution of the paper

    This paper studies a new problem, where underrepresented classes only have few labeled instances available and only exist in a few clients of the federated system. They proposed a novel FL framework FedFew, which consists of three stages and show good results on multi-label classification tasks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed method is very effective in solving the under-explored problem in federated learning.

    The three-stage framework is novel, 1. use federated self-supervised learning to learn class-agnostic representations 2. learn from common classes 3. classify uncommon classes.

    The empirical results are impressive on Chest X-ray14 dataset. And the ablation study demonstrates that including EBM in the training can improve the performance.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Practical issues. Whether the real-world application needs this method or not is not clear. The experiment is about simulation and the number of examples is so small that whether we need such a complex algorithm is a question.

    2. Related Work. The paper needs to cite more related works and compare them to general vision tasks and medical image tasks. Whether there are similar datasets in general computer vision and how the proposed FedFew works on general image datasets are interesting problems to explore.

    3. Format. Fig 2 and Fig 3 are not quite clear.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    They could reproduce the results with some difficulty.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    As shown in the weaknesses.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    As shown in the weaknesses.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper studies an interesting FL setting in the clinical setting. Reviewers agreed the work is novel. However, several concerns were raised by the reviewers. Therefore, I suggest the authors address the points related to the clarify the describing the method, the privacy concerns, and other comments raised by the reviewers during the rebuttal.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    8




Author Feedback

The authors thank the meta-reviewer for the invitation and all reviewers for the time and constructive comments. Especially, the authors thank all reviewers for the encouraging comments, e.g. “the problem formulation is interesting and underexplored” [R2, R3, R4], “novel” [R2, R4], “efficiency in solving a practical problem” [R3, R4], “strong evaluation” [R2, R4], “Clarity” [R4].

We briefly summarize our pipeline and contributions: Existing federated learning methods for MLC on common classes (CCs) and underrepresented/rare classes (UCs) are inefficient. Due to extreme class imbalance and decentralization, improving the performance of UCs can inevitably decrease the performance of CCs, while maintaining the performance on CCs might lead to UCs being completely ignored, as shown in Table 2. Thus, the problem of interest can be rephrased as: how to leverage CCs to learn UCs without impairing the performance on CCs in a federated system. Our solution includes two novel components. We first learn an EBM to detect UCs and then use prototypes to classify UCs. The two novel components can be easily modularized and implemented to cope with standard MLC training on CCs. To the best of our knowledge, this is the first study for decentralized partially labeled rare classes.

Specific concerns:

[R2] Clarity of method presentation: 1. Thank you for the suggestion. Space shortage prevents new figures in the revision, but we will highlight the above summary and comment on the algorithm to improve clarity.

  1. Table 2&3: NNs are hypothetical local baselines that do not use prototypes. In contrast to NN (MLC w/FSSL), FedFew w/o EBM leverages prototypes for robustness and can be used in a federated system. EBM is computed for FedFew w/ EBM during inference. We will highlight these in the revision.
  2. The (C_c + 1) vector is important as it encodes all UCs as a single class when learning an EBM (similar to OOD detection). We can then use prototype inference to classify UCs. Further, combing UCs mitigates class imbalance.
  3. UCs are assumed to be rare in the population, so we only use few labeled examples in the training set.

[R3] 1. Ambiguity on UC client: In our assumption, UC clients do not have labels for CCs. We thus ignore the gradients for missing labels in backpropagation. We will highlight this in the revised paper. Thank you for making us aware of the lack of clarity. Also, for an image from a UC client, no labels for CCs are involved in Eq. 7.

  1. Privacy concern: The major assumption is that, since it does not embody any individual patient information, metadata (statistics) does not violate the data regulations. The metadata in our work consists of the number of samples and the prototypes, which are just mean feature representations from which reconstruction of any patient information is impossible. Besides, exchanging only metadata leads to efficiency in both computation and communication [4].
  2. FSSL is important as it improves the performance of CCs (especially under label scarcity of CCs), and we leverage the knowledge of CCs to learn the EBM and classify UCs. Therefore FSSL is required by our method.

[R4] Practical concern on simulation and complexity: A major contribution of this study is to raise awareness of an underexplored problem. Thus, we simplify the problem and use simulations to provide an empirical study and demonstrate the impact of decentralized partial labels. In Table 2, we aim to show that extreme class imbalance (intentionally with only a few samples for UCs), along with decentralization, can have significant negative impact on common methods, where our solution can make a difference. Citing more related work: Thank you for the suggestion. To the best of our knowledge, this is the first study on the problem of interest and there are no similar datasets or tasks. In general computer vision the data regulations are less strict, making this a medical image domain specific problem.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    After carefully reading the rebuttal, I recommend acceptance. The authors should improve the presentation, emphasize the main focus of the paper, and address the missing details in their final version.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    11



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper proposes a federated learning framework for learning underrepresented classes under partial label scenarios. Originally, reviewers agreed on the novelty of the paper but had concerns on the clarify of the method presentation. The concerns were addressed in the rebuttal to a reasonable level. So I would recommend acceptance of the paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    10



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper explores the problem of federated partially-supervised learning with class imbalance. Based on the reviews and rebuttal, the meta-reviewer recommends the acceptance.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7



back to top