Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Xiaoming Qi, Guanyu Yang, Yuting He, Wangyan Liu, Ali Islam, Shuo Li

Abstract

Federated learning (FL) has shown value in multi-center multi-sequence cardiac magnetic resonance (CMR) segmentation, due to imbalanced CMR distributions and privacy preservation in clinical practice. However, the larger heterogeneity among multi-center multi-sequence CMR brings challenges to the FL framework: (1) Representation bias in the model fusion. The FL server model, which is generated by an average fusion of heterogeneous client models, is biased to representation close to the mean distribution and away from the long-distance distribution. Hence, the FL has poor representation ability. (2) Optimization stop in the model replacing. The heterogeneous server model replacing client model in FL directly causes the long-distance clients to utilize worse optimization to replace the original optimization. The client has to recover the optimization with the worse initialization, hence it lacks the continuous optimization ability. In this work, a cross-center cross-sequence medical image segmentation FL framework (FedCRLD) is proposed for the first time to facilitate multi-center multi-sequence CMR segmentation. (1) The contrastive re-localization module (CRL) of FedCRLD enables the correct representation from the heterogeneous model by embedding a novel contrastive difference metric of mutual information into a cross-attention localization transformer to transfer client-correlated knowledge from server model without bias. (2) The momentum distillation strategy (MD) of FedCRLD enables continuous optimization by conducting self-training on a dynamically updated client momentum bank to refine optimization by local correct optimization history. FedCRLD is validated on 420 CMR images 6 clients from 2 public datasets scanned by different hospitals, devices and contrast agents. Our FedCRLD achieves superior performance on multi-center multi-sequence CMR segmentation (average dice 85.96%). https://github.com/JerryQseu/FedCRLD.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16443-9_25

SharedIt: https://rdcu.be/cVRyD

Link to the code repository

https://github.com/JerryQseu/FedCRLD

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The manuscript proposes an approach for the cross-center and cross-sequence cardiac segmentation problem following a federated learning framework. To deal with the distribution shift problem between clients and the server, a contrastive re-localization (CRL) module is applied which is facilitated by a cross-attention transformer. The optimization of the local client models are assisted by a momentum distillation module which stores the its own training history. An ablation study and a comparison with other federated learning approaches are conducted using Dice similarity index as the metric for accuracy.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Working on the distribution shift problem in federated learning is important.
    2. The introduction of CRL and MD modules to the solution combines novelty with a realistic problem.
    3. Results show substantial improvements compared with standard methods.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The descriptions on CRL and MD lack clarity.
    2. Results could be further refined.
    3. The writing could be improved which could help the reproducibility as well.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The work is conducted on public datasets but the code is not provided. Overall the description could be improved for users to reimplement the method.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. Distribution shift is one of the major challenges in federated/distributed learning which is even more important when working on medical images obtained from multiple centers using different imaging modalities/sequences. The authors are attempting to solve a meaningful problem.
    2. The description on CRL lacks clarity and the authors could have used equations to help describe the loss function etc. Also the terms used in describing the cross-attention transformers are not defined clearly. The parameter alpha was never clearly introduced. Overall the paragraph between page 4 and 5 needs to be rewritten with the assistance of Fig. 3.
    3. The MD training procedure in 2.2 could be made much easier to follow if it is turned into pseudo code/algorithm.
    4. Overall the authors could follow the following paper for the description of methods: Li, Junnan, Ramprasaath Selvaraju, Akhilesh Gotmare, Shafiq Joty, Caiming Xiong, and Steven Chu Hong Hoi. “Align before fuse: Vision and language representation learning with momentum distillation.” Advances in Neural Information Processing Systems 34 (2021).
    5. The authors did not report structure-wise results and distance-based measures. Both could be put into supplementary materials if there is not enough space.
    6. From the ablation study, the mutual information-based loss seems to have the most impact to the performance of the approach. It could be further proved by removing two model features per experiment.
    7. The writing of the manuscript could be improved. Multiple statements are made without context or clear purpose, e.g. on page 4 “In cross-attention transformer, the localized server distribution benefit client”, on page 5 “The Seg_p is the regularization”. Also there are multiple typos, e.g. “wit weight” on page 4 should be “with weight”.
    8. The authors claim that “For the first time, our FedCRLD enables the cross-center cross-sequence medical image segmentation possible”, which is not 100% correct. For example, the following paper has made an attempt on cross-center and cross-sequence problem, and it was done on a different structure as well: Dani Kiyasseh, et al., “Segmentation of Left Atrial MR Images via Self-supervised Semi-supervised Meta-learning.” In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 13-24. Springer, Cham, 2021.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The manuscript attacks a legitimate problem in federated learning and has introduced innovative ideas. Results demonstrate superior performance compared with established approaches. The major flaw is the description of the method which could be addressed in a proper rebuttal.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #2

  • Please describe the contribution of the paper

    Authors have proposed a federated learning framework, specifically suited for datasets with imbalanced distributions, where some datasets are more heterogeneous than the rest of the clients. Their method reduces representation bias in the model fusion and overcomes optimization stop in the model replacing. Their method FedCRLD uses a contrastive difference metric for the former purpose and uses a momentum distillation strategy for the latter.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1.Clarity in problem introduction and motivation: The problem statement regarding heterogeneous datasets and bias in FL is explained well and the motivation is quite clear. The figures are clear and help in understanding the problem. 2.Novelty of the method: The proposed method using momentum distillation is quite novel and the use of cross-attention transformer for reducing representation bias in this application seems interesting (though attention-based algorithm have been attempted for fairer client selection recently in Chen et al., 2021, ArXiv). 3.Works on cross sequence and validation on multiple datasets: The method is shown to work on cine-CMR and DE-CMR sequences obtained from different scanners/centres and dataset size is also different from each other. 4.Results are shown to work, not only on heterogeneous scenario but also on homogeneous scenario (within cine-CMR sequence), indicating the adaptability of the model.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Limited statistical evaluation: FedCRLD seems to perform better than the prior methods, data sharing and within individual clients, but improvements in certain datasets seem quite marginal. It would be quite interesting to see the statistical significance of the improvement in performance. So authors should perform tests wherever necessary and report the significance to better appreciate the effect of the method (same goes for the ablation study). Also would be good to see the standard deviation in addition to the average.
    2. Computational complexity: the method seems quite complex and attention networks are generally computationally heavy - what is the training time? Was any measure taken specifically to reduce computational load? Comparison of such settings could be something to try for the journal article or future work.
    3. Repetition of aims and lack of details: The aims have been explained in the intro section and has been repeated in the method section in great amount of details, while that space could have been used for explaining cross-attention transformer better (eg. details such as number of layers and filter sizes are missing).
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    In general, the implementation details, dataset processing and training parameters (including Data Augmentation) have been explained quite well. As I said earlier, a bit more information on the model architecture would be helpful. Also, certain practical information required for reproducibility could be useful - after the model training how was the model parameters aggregated (figure shows averaging but was it same after cross-attention in the subsequent rounds?), how do authors handle differences in the labels (see table 1) between M&Ms and Emidec datasets (any preprocessing done?) and how was the dice metric calculated?

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    In addition to my above remarks, here are my specific comments:

    1. Please condense the introduction for the methods section and add more details regarding the model architecture.
    2. Please add more details and steps involved in label preparation, data preprocessing and how evaluation metric is calculated.
    3. Authors should add statistical test results and standard deviation values for dice (if dice values are calculated for corresponding number of labels as specified in table 1, it is not a fair comparison)
    4. A brief comment on the computational complexity of the method would be helpful.

    For future work, Authors could extend their analysis to see if the proposed method would be effective for CT vs CMR. It would be good to see the upper limit of model capacity (to know the maximum range of heterogeneity the model can handle without affecting the optimization)

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Representation bias is quite a huge problem in FL and the method aims to reduce the bias at the same time ensuring the optimisation is continuous and not affected by the heterogeneity in the dataset characteristics. The method is novel and handles multiple functions simultaneously and is interesting to see in this specific application. The comparative analysis is extensive on multiple datasets. The work is clearly presented with good figures.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    7

  • [Post rebuttal] Please justify your decision

    The paper was already quite good with minor details to be explained. The authors have agreed to explain these details, include statistical evaluation of results and add required implementation details towards better reproducibility. Hence I go with my previous review of strong accept.



Review #3

  • Please describe the contribution of the paper

    A CRL module that corrects the server bias in federated learning is proposed and supposedly validated.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Comparison with other federated learning strategies and ablation studies are conducted.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The quality of writing and the way of presenting the methods are unsatisfactory (see 8 for details). The experiment descriptions are flawed.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    A public dataset is used. However, the reproducibility is limited due to inadequate description of the methods (e.g., structures of the unet used).

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    I think the manuscript could be improved from the following aspects:

    1. A careful proof-reading is necessary to correct the grammatical errors present in the paper. To name a few: 1)makes the segmentation is still a challenging task 2)causes the multi-center multi-sequence CMR has larger heterogeneity than regular studies 3) However, the heterogeneous server model replacing client model in FL directly causes the long-distance clients utilize worse optimization replace original optimization 4) Through minimizing KL divergence wit weight… 5) The data-sharing strategy makes the model are biased towards similar data…
    2. Large chunk of texts (or texts expressing the same information) are repeating in the Abstract, Introduction and Methodology sections, while the actual methods are not clearly described (in terms of Fig 2 and Fig 3). The authors could make better use of the space for the essentials (add more detailed description of the methods and the U-Net used for experiments).
    3. Qualitative comparisons among the methods are lacking. Besides quantitative comparisons (Table 2), qualitative comparisons (e.g., the output illustrated in Fig. 3) are helpful to visually comprehend the improvements of the proposed methods over previous ones.
    4. It is unclear if the 3D U-Net used for comparison with the other methods (Table 2 [12,9,10,3]) are kept the same and the FL strategies are the sole variable for comparison. It is helpful if the authors could provide more detailed descriptions about the experimental settings.
    5. Visual results (appendix) are compared with traditional deep learning methods instead of other federated learning strategies.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The quality of writing, and the way of presenting the methods are unsatisfacotry. The experiments that support the claimed contributions are not described clearly and convincingly.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The manuscript proposes an approach for the cross-center and cross-sequence cardiac segmentation problem following a federated learning framework. It address the distribution shift problem in federated learning. All reviewers agreed that the work is well motivated. R2 mentioned insufficient evaluation, lack of details of the model, and clarity of presentation. Please address these points in the rebuttal.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    8




Author Feedback

Thanks to the meta-reviewer and all reviewers for recognizing the novelty in our work (“all reviewers agree the work is well motivated”).

  1. Meaningful motivations (“The authors … solve a meaningful problem”-R1, “The problem … in FL is explained well and the motivation is quite clear”-R2).
  2. Novel innovations (“… combines novelty with a realistic problem”-R1, “The proposed method …is quite novel and…seems interesting”-R2).
  3. Substantial improvements (“substantial improvements … superior performance”-R1, “Results … indicating the adaptability of the model”-R2, “… corrects the server bias in federated learning”-R3).

All the constructive suggestions will be adopted and the writing will be checked carefully in the final version.

-The questions are clarified: Q1: About evaluation (“improvements in certain datasets seem quite marginal. It would be quite interesting to see the statistical significance”-R2). A1: The marginal improvement of the best result of the baseline indicates certain clients are close to the average distribution. We have performed the evaluation on the p-value of each client (< 0.001), which indicates the superiority in the statistical significance.

Q2: About model details (“details such as number of layers and filter sizes are missing”-R2, “the reproducibility is limited (e.g., structures of the unet used)”-R3). A2: Thanks for pointing out the details. The missing details (number of layers and filter sizes in the encoder and decoder modules) are the same as the standard Unet (as indicated in the second paragraph of page 6 in the original text). There is no extra design in the modules. Hence, we did not repeat it in the manuscript.

Q3: About presentation (“The description on CRL lacks clarity and the authors could have used equations to help describe the loss function etc.”-R1, “The aims have been explained in the intro section and has been repeated in the method section in great amount of details”-R2, “texts are repeating in the Abstract, Introduction and Methodology sections”-R3). A3: The CRL uses KL divergence as the loss function, as indicated in the equation in the second paragraph of Sec.2.1 (L_m). The repeated text will be replaced by model details.

Q4: About reproducibility (“the code is not provided”-R1, “details such as number of layers and filter sizes are missing”-R2,). A4: The code will be gradually available on Github. Due to the double-blind requirement, the link is not in this version.

Q5: About qualitative comparisons (“qualitative comparisons (e.g., the output illustrated in Fig. 3) are helpful to visually comprehend the improvements of the proposed methods over previous ones”, “Visual results (appendix) are compared with traditional deep learning methods instead of other federated learning strategies”) A5: In the appendix, we included not only Fig.1 for the qualitative comparisons of different federated learning strategies, but also Fig.2 for the comparison of traditional deep learning methods. These indicate the superiority of our method in qualitative results.

Q6: About the variable for comparison (“It is unclear if the 3D U-Net used for comparison with the other methods (Table 2 [12,9,10,3]) are kept the same and the FL strategies are the sole variable for comparison.”-R3). A6: Yes, the FL strategies are the sole variable. The 3D U-Net in compared methods [12,9,10,3] are the same and the FL strategy is the sole variable. The original papers of the compared methods only involve different FL strategies. Hence, the compared methods and our method all take the standard 3D U-Net (as indicated in the second paragraph of page 6).

Q7: About hyperparameters (“The parameter alpha was never clearly introduced. ”-R1). A7: Thanks for the suggestion. Alpha and beta are hyperparameters with a value of 1 (as indicated in the second paragraph of Sec.2.2 on page 5).




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    FL is an interesting topic for MICCAI and the authors have addressed the concerns in the initial review. Therefore I am glad to accept the paper for MICCAI 2022.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This work proposes a novel approach to account for distribution shift in federated learning, with application to multi-sequence cardiac magnetic resonance segmentation. The work received mostly positive comments from the reviewers, and the rebuttal was positive in addressing the questions.

    As also pointed by R1, a missing point of this work concern the lack of a proper technical description and theoretical evaluation of the framework (beyond the description of the architecture given in the rebuttal). A proper formal presentation of the approach seems required, including a convergence analysis to compare with the state-of-the-art methods compared in this work.

    Nevertheless, both method and results seems promising and the paper can positively contribute to the conference.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors propose an interesting paper on federated learning and address adequately the comments of the reviewers in their rebuttal. With the agreed revisions, the paper will be a relevant contribution to the conference programme

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    2



back to top