Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Naif Alkhunaizi, Dmitry Kamzolov, Martin Takáč, Karthik Nandakumar

Abstract

Collaboration among multiple data-owning entities (e.g., hospitals) can accelerate the training process and yield better machine learning models due to the availability and diversity of data. However, privacy concerns make it challenging to exchange data while preserving confidentiality. Federated Learning (FL) is a promising solution that enables collaborative training through exchange of model parameters instead of raw data. However, most existing FL solutions work under the assumption that participating clients are honest and thus can fail against poisoning attacks from malicious parties, whose goal is to deteriorate the global model performance. In this work, we propose a robust aggregation rule called Distance-based Outlier Suppression (DOS) that is resilient to byzantine failures. The proposed method computes the distance between local parameter updates of different clients and obtains an outlier score for each client using Copula-based Outlier Detection (COPOD). The resulting outlier scores are converted into normalized weights using a softmax function, and a weighted average of the local parameters is used for updating the global model. DOS aggregation can effectively suppress parameter updates from malicious clients without the need for any hyperparameter selection, even when the data distributions are heterogeneous. Evaluation on two medical imaging datasets (CheXpert and HAM10000) demonstrates the higher robustness of DOS method against a variety of poisoning attacks in comparison to other state-of-the-art methods.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16452-1_64

SharedIt: https://rdcu.be/cVVql

Link to the code repository

https://github.com/Naiftt/SPAFD

Link to the dataset(s)

https://stanfordmlgroup.github.io/competitions/chexpert/

https://www.kaggle.com/datasets/kmader/skin-cancer-mnist-ham10000


Reviews

Review #1

  • Please describe the contribution of the paper

    In this paper the authors propose a new method for federated learning that aims to address “poisoning attacks”, ie. considers that some of the nodes contributing to the federated learning network are malicious. The proposed method is based on computing distances between the parameters that are communicated by each node and then weights them according to a copula-based outlier detection method. Empirical results illustrate the usefulness of the proposed method in two datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Well motivated and presented paper
    • Places work in context to other works and compares against them in empirical results
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The limitations of the proposed framework are not sufficiently discussed. Why is it necessary that the proportion of clients experiencing byzantine-failures is less than 50%? There is not sufficient discussion, or proof, to justify this.
    • The discussion about limitations should also include distribution of data/classes into the different nodes. ie. in the non-iid data distribution case, what are the limitations depending on how the data and classes are distributed?
    • E and C groups (referenced in some parts of the paper, like Algorithm 1) should be more clearly defined through the text.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    It seems feasible to reproduce the results with the provided information.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    This is a well motivated and presented paper that proposes a new method for federated learning that aims to address poisoning attacks. The work is discussed in context to related work and empirical evidence is presented to support the usefulness of the proposed method.

    The limitations of the proposed framework are not discussed. There is a claim that the proposed method works when the proportion of clients experiencing byzantine-failures is less than 50% but this is not sufficiently justified. Moreover, there is no discussion about the proposed methods limitations with respect to the data/class distribution among the nodes that are participating in the federated learning network. These issues make it hard to assess the practical impact of the proposed method.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • Well motivated and presented paper with empirical results comparing to other approaches.

    • Very little discussion about the limitations of the proposed method.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Somewhat Confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #2

  • Please describe the contribution of the paper
    1. The paper propose a general aggregation rule of federated learning.

    2. The proposed method is technically sound and simple.

    3. The experimental results show the effectiveness of DOS against several types of attacks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The key strength of this paper is developing a robust aggregation rule for FL against attacks from malicious clients. The aggregation rule employs a parameter-free outlier detection algorithm, namely COPOD, to detect abnormal values among the distance metrics of model parameters between the clients and the global model.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Some key references and baselines are missing, e.g, RFA (Robust Aggregation for Federated Learning) and SparseFed (SparseFed: Mitigating Model Poisoning Attacks in Federated Learning with Sparsification).

    2. Two differences between the proposed method and existing methods are: 1) using COPOD to score the clients, and 2) performing scoring on a distance space instead of the parameter space. The implementation details are well-described but lack theoretical analysis.

    3. Despite the variety, the experiment settings are not comprehensive and the evaluations are less insightful.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The implementation detail of the proposed method is well presented, as well as the experimental settings.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. Federated learning is indeed a good solution to enable a multi-center collaborative training process. However, in this paper, the only connection between medical images and FL is the dataset employed in the experiments. My suggestion is to change the motivation, making it closely related to the clinical aspect. For example, the authors can provide some specific attacks that commonly appeared in FL with medical images, then analyze why existing techniques will possibly fail in such cases.

    2. Compared with SOTA works in this area, a major issue of this work is the lack of theoretical analysis. It would be better to discuss why the parameter-free algorithm is robust in abnormal detection, and why the distance space is more appropriate for detecting malicious clients.

    3. To make the experiments more comprehensive, it would be better to analyze the performance under the different percentages of malicious clients (e.g., 10% to 40%) and different numbers of clients (e.g., up to 30 clients). Moreover, methods like RFA or SparseFed should be included as two strong baselines. The current version of Section 4.4 Results and Discussion is not very insightful, which only demonstrates the good performance without discussing why baseline methods fail and why the proposed method works well.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper proposes an effective aggregation rule for FL learning, and the experimental results support some of the authors’ claims. However, this work may bring limited contributions to the MICCAI community.

    1. The proposed method is developed on the existing FL framework and changes the client scoring function by using COPOD. Although the performance is good, the methodology lacks theoretical analysis.

    2. this work focus on an important research topic, Federated learning, but has a relatively limited impact on medical images and may be interested in a narrow group of MICCAI audiences. It would be more interesting and impact to enhance the connection between the proposed method and medical images.

    Therefore, I think the weakness of this paper slightly weighs over its merits, and I suggest a weak reject.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #4

  • Please describe the contribution of the paper

    This paper proposes a Federated Learning framework with Distance-based Outlier Suppression (DOS) based on Euclidean and Cosine distances and a softmax operation with temperature for tackling client poisoning attacks. The proposed method is evaluated on two medical imaging datasets on classification tasks and achieve improved performance over previous methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The proposed method achieves improved results over previous methods indicating the effectiveness of the method.

    2. The method leverage two distance metrics to achieve robust distance and outlier measurements and could be extended for other distance metrics.

    3. Rich experiments are conducted. Visualizations shows clear model performance under different situations.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The authors use -1 as temperature parameter in softmax computation to reduce client weights with higher outlier scores and increase client weights with lower outlier scores. Though -1 is a feasible approach, it changed the original relative distribution of the learned outlier scores from the exp function. It may be better to keep the relativity for easier learning of the network.

    2. The fonts in figures should be increased. Now it is so hard to see clearly.

    3. Why the authors assume 40% of the clients are malicious instead of, say, 50%, 60%, 100%?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Please refer to the comments.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, I think this is an interesting paper with clear presentation. I hope the authors could address my questions.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    The authors addressed my concerns. I will keep my ratings.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The work was found well motivated, and the results are found encouraging. The reviewer criticism concern the lack of discussion of the limitations, in particular in non-iid setting and in presence of large number of attackers, and the lack of a theoretical analysis to justify the proposed approach. Moreover, there are some methodological and experimental that require clarifications, as well as an improved positioning with respect to the MICCAI community.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    9




Author Feedback

Proportion of malicious clients [MR, R1, R2, R4]: Distance-based Outlier Suppression (DOS) method is designed for the honest majority scenario, which is a common assumption in many consensus algorithms dealing with Byzantine failures. Since our approach treats malicious weight updates as outliers, the protocol will fail when a majority of clients are malicious. Under the dishonest majority scenario, malicious updates are no longer outliers since they begin to dominate the distance distribution. To address reviewer concerns, we conducted experiments by fixing the number of clients and increasing the proportion of malicious clients from 10% to 60%, in steps of 10%. As expected, the DOS approach was robust until the proportion of malicious clients was less than or equal to 50% and failed when the proportion was 60% (e.g., for HAM10000 dataset, the AUC values were 0.695, 0.697, 0.696, 0.711, 0.710, and 0.554 for 10%, 20%, 30%, 40%, 50%, and 60% corruption, respectively. In comparison, the AUC without any attack was 0.70). This is the reason for choosing the proportion of malicious clients as 40% for most of our experiments. Non-IID setting [MR, R1]: We have conducted experiments under five different non-iid settings, where we randomly partition the HAM10000 dataset among 10 clients. For all the five experiments, the convergence trends and the final accuracy were similar (between 0.69 and 0.71) to the iid case, showing that the DOS method can work well in the non-iid setting. Comparison with RFA and SparseFed [R2]: We have compared DOS against the RFA method, which is also a robust aggregation rule based on geometric median. For most experiments (with different proportion of malicious clients and types of poisoning), the performance of DOS and RFA were comparable, with DOS method having a marginal edge when the proportion of malicious clients was higher (40%). Only in the case where 40% of clients transmitted Gaussian noise, the RFA method had a significantly lower accuracy (0.625) compared to the DOS method (0.69). RFA also attempts to preserve privacy of local updates, which is a possible future extension for DOS (secure computation of COPOD score). SparseFed is an orthogonal approach to mitigate poisoning because it relies on gradient clipping and top-k sparsification of updates. Therefore, it may be possible to apply SparseFed on top of DOS to achieve better communication efficiency.
Different number of clients [R2]: We conducted experiments by fixing the proportion of malicious clients to 40% and increasing the number of clients from 5 to 40. The AUC values were 0.725, 0.700, 0.692, and 0.674 for 5, 10, 20 and 40 clients, respectively. There is a minor degradation in accuracy when the number of clients increases. The distance computations also increase quadratically with the number of clients. Therefore, the DOS approach may be more suitable for cross-silo FL settings (with less than 100 clients).
Theoretical analysis [MR, R2]: Since the DOS method is based on a linear combination of local updates, it is possible to prove convergence theoretically following the same principles of FedAvg. Complete theoretical analysis is left for future work. Empirical results show that weights assigned to malicious clients quickly converges to zero, which ensures convergence. Temperature parameter [R4]: Larger distance scores correspond to stronger outliers and must be suppressed. Since the distance scores are inversely proportional to the assigned weight, temperature parameter of -1 is appropriate. Relevance and Motivation [R2]: Federated learning is being increasingly used in medical imaging applications [5,8,28] and the competitive nature of the participants makes them vulnerable to poisoning attacks. This makes the proposed method relevant to the MICCAI community. Detecting outliers (poisoned updates) directly in the parameter space is challenging due to high dimensionality that is the reason why distance space is more appropriate.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal addressed most of the concerns raised by the reviewers. Additional results are provided to illustrate the proposed defense methods in more comprehensive scenarios with varying proportion of malicious clients.

    Compared to SOTA approaches, this work is still empirical and lacks of a proper theoretical analysis to justify the claims which here are only demonstrated experimentally (e.g, rate of convergence, presence of bias). All in all, while the work is preliminary, results are promising and may be interesting for the community.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The work is well-motivated and the proposed solution is reasonable. Rebuttal further clarified the implementation/experimental details.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    8



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper discusses an interesting problem in FL and proposes a new method to detect outliers for defending against byzantine poisoning attacks. Although the proposed method has many merits, I want to raise two concerns:

    1. stronger baselines The authors compared their results with a few baseline aggregation methods: FedAvg (2017), [31] (2018), [4] (2017). First, I wonder why Bulyan [13] was not selected as a baseline in the experiment. Also, they ignored many stronger defense strategies for the same attack problem. To align with the main point in this submission on byzantine robust aggregation methods, I list some baselines below:

    Pillutla, K., Kakade, S. M., & Harchaoui, Z. (2022). Robust aggregation for federated learning. IEEE Transactions on Signal Processing, 70, 1142-1154. (citation 121, based on google scholar)

    Cao, X., Fang, M., Liu, J., & Gong, N. Z. (2021, January). FLTrust: Byzantine-robust Federated Learning via Trust Bootstrapping. In ISOC Network and Distributed System Security Symposium (NDSS). (citation 49, based on google scholar)

    Li, L., Xu, W., Chen, T., Giannakis, G. B., & Ling, Q. (2019, July). RSA: Byzantine-robust stochastic aggregation methods for distributed learning from heterogeneous datasets. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 33, No. 01, pp. 1544-1551). (citation 141, based on google scholar)

    1. Justification of medical settings. The authors present a FL setting across hospitals in Fig 1. I doubt the possibility of the setting that up to 50% of hospitals (based on authors’ rebuttal) in FL could (intentionally) perform Byzantine attacks. For example, in FL for IoT applications, a Byzantine attack is more reasonable as individuals are the parties in FL. However, hospitals are more trustworthy institutions. Thus, the authors set such high malicious rates in the experiment, which is questionable. I suspect the setting is more like a playground. It may not fit the real medical imaging analysis scenarios and solve a real medical imaging problem. I am not sure if Byzantine attack is a real problem when FL is performed across hospital and whether the malicious rates are reasonable.
  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    19/30



back to top