Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Xinyu Liu, Wuyang Li, Yixuan Yuan

Abstract

Federated learning (FL), which trains a shared global model by collaboration between distributed clients (e.g. medical institutions) and preserves the privacy of local data, has been widely deployed in the medical field to benefit abnormality diagnosis. However, it is inevitable that local data contains noise across clients, resulting in notably performance deterioration in the global model. To this end, a practical yet challenging FL problem is studied in this paper, namely Federated abnormality detection with noisy clients (FADN). We represent the first effort to reason the FADN task as a structural causal model, and identify the main issue that leads to the performance deterioration, namely \textit{recognition bias}. To tackle the problem, an Intervention \& Interaction FL framework (FedInI) is proposed, comprising two key strategies: (1) Intervention: considering the data distribution heterogeneity caused by different noisy levels within each client, we use the global model to intervene the training of local models, by shuffling and mixing features extracted from different models and suppress the noise gradually; (2) Interaction: we devise an adaptive sample-wise weighting strategy that jointly considers the local training statuses and global noisy levels with a shared interactive layer. Extensive experiments on class-conditional noise and instance-dependant noise settings are conducted, FedInI outperforms state-of-the-arts by a remarkable margin.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16452-1_30

SharedIt: https://rdcu.be/cVRZc

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposed an approach based in Federated learning (FL) for abnormality detection with noisy clients. The novelty of the work lies in way of addressing the noisy input given by local model. It is very much desired in practice. Experimental results are shown to show the improvement in the results.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The novelty of the work is in mixing the features/evidence provided by local models so that if there is any noisy evidence then it will be suppressed.

    The second important aspect is to consider share information across local model while training. I

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Not much

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Looks promising

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    It is a very good approach and computational complexity of approach has to be mentioned

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    It is very well written paper and novelty of the work is explained very well.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper presents a causality-inspired method for federated abnormality detection with noisy clients and designs a debiasing solution namely Intervention & Interaction FL framework to alleviate the client confounder effect. The experiments on class-conditional noise and instance-dependant noise demonstrate the efficacy of the proposed method

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed method is well motivated and it can be considered as the typical effort to leverage Federated learning to solve the problems in computer aided diagnosis. The proposed method designs two different strategies to solve the main issue-recognition bias. The experimental setting is designed reasonably and ablation studies and comparison experiments demonstrates the efficacy of the proposed method. Finally, the presentation is good and the readers easily follow the method.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper leverages two strategies to handle the applications of FL in abnormality detection. The motivation is good, I appriciate this idea. However, I consider the authors should provide more details of how to design and incorporite the strategies into FL model. Specifically, model training is crucial to employ the proposed model, more details and discussions are neccessary. For intance, what is the added computional cost compared to the common FL model.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    I didn’t check it

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The paper represents the FADN task as a structural causal model, and identify the main issue of recognition bias. Specifically, two novel strategies are proposed to handle the applications of FL in abnormality detection. The motivation is reasonable, the extensive experiments designed on the benchmark datasets demonstrate the efficacy of the proposed method. To further improve the quality, the authors should provide more details of how to design and incorporate the strategies into FL model. Specifically, model training is crucial to employ the proposed model, more details and discussions are necessary. For instance, what is the added computational cost compared to the common FL model. How to define the training objective and whether it can be trained more efficiently?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper proposes a novel FL framework for the applications of computer aided diagnosis and specifically designs the strategies to solve the issue of FL. The framework is reasonably designed and the results are also promising.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper presents a framework for abnormality detection in the setting of federated learning where a centralized global model is trained using decentralized, local data. The paper addresses the scenario where such a modeling scheme suffers from noisy labels across distributed clients, by using structural causal modeling (SCM) to identify the clients causing confounding bias. To resolve the bias, intervention and interaction approaches are used. Interaction adaptively estimates appropriate weights that balance local training status with global noise levels. Intervention uses these weights to shuffle and mix features from the local client and the global model to gradually reduce the detrimental effect of local noise.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The use of SCM in a federated setting for noisy clients is novel. By identifying and intervening on the causal factor (noisy client), the overall recognition bias is reduced. Unlike approaches where noisy labels are handled by robust losses, regularization, etc., this method intuitively adapts to the distributed learning paradigm. Instead of only improving the globally learned model, it naturally also identifies the confounder clients. This may help in a clinical setting by automatically identifying sites that may consistently differ in their labeling strategy or data quality.

    The approach addresses two problems simultaneously. First, how to use the global model to intervene and debias clients causing recognition bias. This in turn, improves the overall global model as well. Second, how to find the optimal way to intervene, by using an adaptive weighting scheme for each client that best reflects the current noise status, locally and globally.

    Extensive experiments have been shown, including comparison with several related approaches. The paper is well written and nicely condenses many aspects of the problem and method within the page limit.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The following causality assumptions are a bit counter-intuitive and will benefit from clarification. Does the link between C (client) and M (image features) indicate that each client has its own direct impact on the ‘locally’ learned model (and therefore, features). That is, by changing the client, we induce a change in the model and therefore, M. Or, is M the ‘global’ feature set, affected by each client? Shouldn’t there also be a direct link between client C (e.g. hospital labeling the images) and the noisy label Y_tilde as changing C will have a direct effect on the noisy label assigned by C’s diagnosis. This causal effect is independent of the fact that changing C, changes M, and therefore Y_tilde during prediction. If so, how will that affect the modularity of the SCM and the proposed do-calculus? It is not clear why equation 1 doesn’t include P(Y_tilde|X). P(c|M) is also counter-intuitive to the direction of the causality between C and M.

    As per Sec 2.2, the stratification assumes that the global model is an approximation of the confounder set. So, the shuffling and mixing of features between global and local makes sense - using globally learned information to suppress local noise (equation 4). However, in this context, what is the specific advantage of extending the mixing across different images from the same local batch (M_c_hat_1 and M_c_hat_2)?

    Sect 2.4 says that all clients are used for the initial training. If so, the learned global model may still be biased by local clients’ feature distributions. How will this impact the ability to generalize to unseen test clients with a distribution-shift in the feature distribution?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper provides sufficient details for reproducibility. However, the authors have not clarified if the code will be released upon acceptance. Some details can be added, e.g. in instance dependent noise, the proportion of labels that were flipped based on the proposed criterion, statistical significance of the reported differences in mAP values.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Ideas for an extended journal version of the paper:

    It will be interesting to see the effect of different data partition strategies used to distribute data among clients (in the training stage, where data is fed into both local and global models- Section 3.1). Can a smarter partition (such that the confounder client is pre-identified) mimic the advantage of the SCM and interaction steps?

    The authors may want to explore and clarify the difference in the source of noise- making the labels noisy versus making the underlying images noisy. While both will impact the final prediction accuracy, they may potentially affect the structural causal model’s assumptions differently. They may alter the modularity or independence of each node via unobserved variables and add confounding association effects on top of causation effects.

    In the interaction strategy (equation 5), while a good regularization is achieved, any truly exceptional (but legitimate) data points might be missed and the overall differentiating power of the local client will be suppressed. In such cases, the lambda_local value may be low and the global features will take precedence in the mixing. To counter this, it will be good to discuss this in the context of larger training data sets, more clients, or introducing semantic counterfactuals to increase the variety picked up and learned by the model.

    In the ablation experiments, doesn’t removing intervention effectively make the interaction redundant? If so, is this the same as a base federated model with no intervention-interaction component?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I liked reading this paper as it nicely establishes motivation, provides the required background and method details. While some assumptions may be weak or unclear, generally the proposed approach can encourage broader discussion and innovation on an important problem for the MICCAI community (learning from data while respecting data privacy and heterogeneity).

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Somewhat Confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper presents a causality-inspired method for federated abnormality detection with noisy clients. The reviewers were in agreement with the novelty of using structural causal modeling in a federated setting for noisy clients, extensive experiments, and good presentation. Minor concerns were raised regarding model training details, which I believe can be successfully addressed in the final version.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    1




Author Feedback

We thank the Meta-Reviewer and Reviewers for their valuable comments. We appreciate it very much that the reviewers gave very positive feedback.

R2 Q1. The authors should provide more details of how to design and incorporate the strategies into FL model. A: The incorporation of the proposed strategies into an FL model (FedAvg) includes the following steps: (1) In each iteration of local training, we compute the cosine similarity between features extracted from the server model and the current client model to obtain lambda_local; (2) we feed the shuffled server feature and original client feature into the shared interactive layer to obtain lambda_global; (3) we compute the overall learning objective to update the client model.

R2 Q2. What is the added computational cost compared to the common FL model? A: The additional network components are three linear mappings in the shared interactive layer. We compute the FLOPs and parameters, and the proposed components only require 0.3GMac FLOPs and 0.37M parameters. Compared with the baseline client model FCOS (15.97GMac and 50.78M parameters with 400*400 as input), the added computational cost is negligible, demonstrating the efficiency of the proposed design. R3 Q1 Does the link between C (client) and M (image features) indicate that each client has its own direct impact on the ‘locally’ learned model/features? A: Sincerely appreciate the question. Yes, the effect C->M represents that each client model will extract distinct features for a specific image.

R3 Q2 Shouldn’t there also be a direct link between client C (e.g. hospital labeling the images) and the noisy label Y_tilde as changing C will have a direct effect on the noisy label assigned by C’s diagnosis? A: In this paper, we consider the mislabeling of an image occurs regardless of the client assignment. Thus the noisy label effect is only given by C->X->Y_tilde. However, if the noisy labels are caused by different labeling protocols or diagnoses among clients, then there will exist a direct effect from C to Y_tilde. We will delve into this setting in our future work.

R3 Q3 It is not clear why equation 1 doesn’t include P(Y_tilde X). P(c M) is also counter-intuitive to the direction of the causality between C and M. A: We omit the X in P(Y_tilde M, X, c) as the effect on the image X caused by annotators cannot be directly alleviated by causal intervention. P(c M) is derived from the total probability.

R3 Q4 What is the specific advantage of extending the mixing across different images from the same local batch (M_c_hat_1 and M_c_hat_2)? A: We found that sampling from the whole dataset is more time and storage-consuming, and the performance of mixing across different images from the same local batch is comparable to sampling from the whole dataset.

R3 Q5 All clients are used for the initial training, the learned global model may still be biased by local clients’ feature distributions. How will this impact to unseen clients? A: Considering the global model may be biased by local feature distributions, we conduct intervention and interaction from the first communication round. Therefore, with the increase of communication rounds and the proposed strategies, the negative impact on unseen clients will be gradually suppressed.



back to top