Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Zheyao Gao, Lei Li, Fuping Wu, Sihan Wang, Xiahai Zhuang

Abstract

Distributed learning has shown great potentials in medical image analysis. It allows to use multi-center training data with privacy protection. However, data distributions in local centers can vary from each other due to different imaging vendors, and annotation protocols. Such variation degrades the performance of learning-based methods. To mitigate the influence, two groups of methods have been proposed for different aims, i.e., the global methods and the personalized methods. The formers are aimed to improve the performance of a single global model for all test data from unseen centers (known as generic data); while the latters target multiple models, of which each for one center (denoted as local data). However, little has been researched to achieve both goals simultaneously. In this work, we propose a new framework of distributed learning that bridges the gap between two groups, and improves the performance for both generic and local data. Specifically, our method decouples the predictions for generic data and local data, via distribution-conditioned adaptation matrices. Results on multi-center left atrial (LA) MRI segmentation showed that our method demonstrated superior performance over existing methods on both generic and local data.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16431-6_49

SharedIt: https://rdcu.be/cVD65

Link to the code repository

https://github.com/key1589745/decouple_predict

Link to the dataset(s)

N/A


Reviews

Review #2

  • Please describe the contribution of the paper

    In this paper, the authors present a technique for performing image segmentation with federated learning while combining the two diverging tasks of global and local optimisation. Their approach draws inspiration from the probabilistic U-net, and consists of a VAE architecture and a “DA net”, which in combination allow to tune the network’s prediction to the specific local distributions. The approach is evaluated for atrial image segmentation from MRI and seems to reliably outperform the implemented baselines.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper addresses a very relevant and difficult problem, namely combining global and local optima in a federated learning setting.
    2. The methodology is novel, sound and interesting.
    3. The experimental set-up is (mostly) well-devised and the results are convincing.
    4. The paper is generally well-written and easy to follow.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Partially problematic experimental procedure: the authors “generated a test set of generic data, using 30 cases from Utah with no modification to the gold standard labels”. Since these images belong to the same distribution of those used for training one of the nodes, I don’t think they constitute a good proxy for generic data. This aspect should be at least discussed in the Discussion section.
    2. Lack of details about the network architectures: for instance, the authors state that “The personalized module was built with five convolution layers and SoftPlus activation”, but no other information is available. This must be fixed by presenting the full architectural details at least in the supplementary material.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    I find the authors’ answers do not represent the reality of their submission in several aspects. Specifically, the following are missing in the paper:

    1. Description of the study cohort.
    2. Information on sensitivity regarding parameter changes.
    3. The exact number of training and evaluation runs.
    4. Details on how baseline methods were implemented and tuned.
    5. An analysis of statistical significance of reported differences in performance between methods.
    6. The average runtime for each result, or estimated energy cost.
    7. A description of the memory footprint.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. Some grammatical errors and unclear sentences: Page 1: “has shown great potential”, “the former”, “the latter” (no plural), “the latters target multiple models, of which each for one center (denoted as local data)” (unclear sentence), “been proposed to for”. Page 4: “the segmentation risk”.
    2. In page 2, the authors state that “Although the above methods have solved the problem of privacy and fairness”. I think that neither of these problems have been solved: issues about privacy can still arise given that it has been shown that images from the training dataset can still be allucinated from a trained model. As for fairness, the concept is very broad and should be better defined in this paper to better understand what the authors mean. Please edit accordingly.
    3. Page 5: “As q → 0, it is equivalent to CE loss which emphasizes more on uncertain predictions, and it degrades to MAE loss [3], which equally penalizes on each pixel, as q approaches 0.” This sentence is unclear. Please revise.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is highly interesting. It presents a novel solution to a very relevant problem in federated learning, which could become the state of the art for this task.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    The manuscript proposes a method extended the ideas from [1] for distributed Learning for Multi-Center Left Atrial MRI segmentation. The authors used the “conditional VAE [18]” to model the latent representation of the joint data distribution, and proposed a “distribution adaptation network” to generate an adaptation matrices conditioned on the joint data distribution. This is used to decouple the global and local predictions and adapts the prediction to be consistent with local distribution during testing. The proposed method evaluated on the constructed dataset, collected from three centers, and the results shows substantial performance improvement for global and local tasks over its competitors.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Good clinical feasibility for distributed learning to handle the learning-based analysis method with privacy-sensitive data. Good results (outperform competing methods) in DICE score metrics across global and local tasks.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Difference from previous work are not clearly described. Result reliability may be affected by the dataset settings during experiments. See detailed comments.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    some details are not clearly described to reproduce the work.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • The connection to [1] should be emphasized. Currently, [1] is only simply touched on. Please explain the main ideas of [1] and how they approach the two seemingly contradictory objectives simultaneously. Following this, please clearly explain how the proposed method follows the approaches of [1] and clearly list the proposed extensions / differences with respect to [1].
    • The method proposed a distribution adaptation network to used learned latent representation to generate an adaptation matrices. But it is unclear how the DA net yield such a high dimensional matrices? How about the computation and memory? It would be better to describe clearly or provide some visualization.
    • The method introduced a regularization term L_TR to minimize the diagonal elements of W_k^I for each pixel. The authors claimed that it is used to forces the matrices to modify the prediction as much as possible. However, the underlying mechanism is not clear.
    • There are some inconsistent descriptions in the paper: Introduction Para.5 “our method decouples the predictions and labels” and Sec.2.2 Title “Decoupling Global and Local Predictions”. Please clarify it.
    • The method is evaluated on datasets constructed from three centers. Morphological operations is used to simulate the settings of Center C/D/generic data (unseen center) for Utah dataset. The method demonstrated significant superiority in Center C and D. But it seems that most of the training data come from the Utah dataset (35 for C,D and 15 for A,B. So this may be the main factor of the obvious improvement of the proposed method for center C and D, but not for Center A and B, as shown in Table 1. This make the experiment results less convincing. Please clarify.
    • Please also describe clarify the different of the propose method and the competitors in the introduction.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The work is interesting and results are good. The novelty of the method and the reliability of the results need to be further clarified.

  • Number of papers in your stack

    3

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    the authors addressed my concerned on the connetion to [1], the results on center C and D, as well as the regularization term.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The proposed federated VAE methodology for model adaptation in image segmentation was found interesting by the reviewers, and the problem relevant to the community. Important information about the experimental setup are missing, affecting clarity and reproducibility of the contribution. For example, data partitioning seems quite arbitrary and based on low sample size, and a natural question concerns the stability of the results with different partitioning and data selection. The method introduces several hyper-parameters (eqtn. 7 and 8), that must be tuned to the specific application. Hyper-parameter tuning is particularly hard in federated learning, and it is not clear whether this aspect may negatively affect the use of the framework in a real setting. Similarly, it is not clear what was the choice for the hyper-parameters of competing methods such as FedProx.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6




Author Feedback

Dear Area-Chairs,

We would like to thank the meta-reviewer (M-R) and 2 reviewers (R2, R3) for their very constructive and thoughtful comments, which have greatly improved our manuscript. We have summarized several main comments and given corresponding responses.

  1. Regarding experiments: Q: M-R stated the small sample size and random data partition and R3 further stated the improvement shown in center C, D rather than center A, B was due to that center C, D contain more training data than center A, B, which could undermine the contribution of this paper. A: (1) Regarding sample size, we agree that a large sample size is desired. However, curation of a large dataset for medical image analysis is difficult. For example, for the evaluation of federated segmentation methods, the well-established multi-center prostate dataset (from NCI-ISBI 2013, I2CVB, PROMISE12) applied in [Zhang et al. MICCAI 2021] has a similar size (97 cases from 6 centers) as ours (100 cases from 3 centers), and they evaluate their distributed learning methods in a similar way. (2) Regarding data partition, as Table 1 shows, the improvements in centers A (2.2%) and B (5.3%) are similarly evident to C (2.3%) and D (2.3%) when they are compared with the global method (FedProx). The difference was indeed more obvious when compared to the personalized methods, which were severely affected by the segmentation bias, and thus achieved worse results. We will further add more clarification to the last sentence in the second paragraph of Section 3.3.

Q: M-R expected more discussion on the hyper-parameter tuning and settings of the proposed method and the compared methods. A: Thanks for the suggestions. We will add the related information to the Supplementary Material.

Q: R2 is concerned about the reasonableness of the constitution for generic data, which appear to have the same image distribution as the training data A: Sorry for the confusion. The images of centers C and D were indeed from the same distribution, but their labels (gold standard segmentation) were different in our experimental setting, which therefore constitutes the non-IID situation called label skew in this study. Nevertheless, we do agree that having images from unseen centers as generic data would be a better choice given a larger dataset was available.

  1. Regarding methodologies: Q: R3 expected explanations of the connection between our method and the method in [1]. A: The method in [1] uses two predictors for local and global predictions, and it estimated the label distribution by the proportion of each class in training data, which is not applicable for image segmentation tasks. Our method uses adaptation matrices to modify the global predictions based on the distribution modeled by a variational Bayesian framework. We further clarify this in the revised manuscript.

Q: R3 was wondering about the dimensionality of adaptation matrices and the underlying mechanism of the regularization term L_TR that facilitates the changes in prediction. A: (1) The dimensionality is 4256256, which is far less than the intermediate feature maps, resulting in little additional computation. (2) As L_TR minimizes the diagonal elements of the adaptation matrices, the non-diagonal elements which denote label flipping probabilities become dominant [25], and thus facilitate the changes. Due to the limited space, we will add more details to the Supplementary Material in the revision.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors rebuttal clarifies some aspects on the experimental setup, which were raised by the reviewers and meta-reviewer.

    Concerning the question on the data partitioning and size:

    • The question was not about the sample size, but rather about the specific choice of the partitioning, which is arbitrary. The reported results may be essentially due to the specific split applied to the data, and with such low numbers the variability of the output may be quite high. Cross validating the experiments with respect to difference train/test split seems still necessary to confirm the reported improvement.

    Concerning the hyperparameters:

    • The rebuttal does not give any clarification, since the authors say that they will clarify this aspect in the supplementary material. The core of the method relies on the proper tuning of the different cost functions of furmulas (7) and (8). The contribution of this method cannot be fully appreciated withouth clarifying this aspect.

    • Similarly, it is not possible to appreciate the improvement over the competing approaches if no detail is given on how they have parameterized for these experiments.

    For these reasons, although the paper seems interesting and the other reviewers appreciated the novelty of the idea, my feeling is that there are important details that are missing to fully recommend acceptance to the conference.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    10



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    All in all, this paper addresses questions that are highly relevant for the community. the weaknesses are discussed and should not prevent acceptance at MICCAI.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    upper



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper presented a framework for image segmentation using federated learning while integrating the two divergent objectives of global and local optimization. Their solution is based on the probabilistic U-net and comprises of a VAE architecture. The system enable tuning the network’s prediction to specific local distributions. The method is tested for atrial image segmentation from MRI and appears to consistently outperform other methods. This is a strong paper with minor weaknesses. The AV voted to accept this paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    NR



back to top