Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Jingyang Zhang, Peng Xue, Ran Gu, Yuning Gu, Mianxin Liu, Yongsheng Pan, Zhiming Cui, Jiawei Huang, Lei Ma, Dinggang Shen

Abstract

In clinical practice, a segmentation network is often required to continually learn on a sequential data stream from multiple sites rather than a consolidated set, due to the storage cost and privacy restriction. However, during the continual learning process, existing methods are usually restricted in either network memorizability on previous sites or generalizability on unseen sites. This paper aims to tackle the challenging problem of Synchronous Memorizability and Generalizability (SMG) and to simultaneously improve performance on both previous and unseen sites, with a novel proposed SMG-learning framework. First, we propose a Synchronous Gradient Alignment (SGA) objective, which not only promotes the network memorizability by enforcing coordinated optimization for a small exemplar set from previous sites (called replay buffer), but also enhances the generalizability by facilitating site-invariance under simulated domain shift. Second, to simplify the optimization of SGA objective, we design a Dual-Meta algorithm that approximates the SGA objective as dual meta-objectives for optimization without expensive computation overhead. Third, for efficient rehearsal, we configure the replay buffer comprehensively considering additional inter-site diversity to reduce redundancy. Experiments on prostate MRI data sequentially acquired from six institutes demonstrate that our method can simultaneously achieve higher memorizability and generalizability over state-of-the-art methods. Code is available at https://github.com/jingyzhang/SMG-Learning.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16443-9_37

SharedIt: https://rdcu.be/cVRyR

Link to the code repository

https://github.com/jingyzhang/SMG-Learning

Link to the dataset(s)

https://liuquande.github.io/SAML/


Reviews

Review #1

  • Please describe the contribution of the paper

    Authors propose a new Synchronous Gradient Alignment objective and associated dual meta objective. The paper also provides some technical and heuristic details on replay buffers in order to reduce redundancy and improve model generalisability.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Good description of the asymmetry and problems of joint minimisation.
    • The proposed SGA approach is interesting and similar to the idea behind cosine losses in a multi-task setup.
    • Very interesting heuristics and insights in how to setup the replay buffer for the SGA approach
    • There was a quasi-ablation study due to the fact that completing methods have a subset of the proposed features
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The paper is very limited in scope to the domain of continual learning, but fails to appreciate that there are other approaches to train on data from multiple sites, such as federated learning. While continual learning is necessary in multiple-task setups and when transferring knowledge between tasks without forgetting the previous task, it is not necessary when the task is ultimately the same across sites/datasets.
    • Comparison to a simple FL approach is necessary.
    • The paper would have benefited from statistical comparison between methods (e.g. statistical tests) or at least providing some confidence bounds.
    • The ablation study is incomplete and limited to the features implemented by completing methods. We don’t actually know the performance improvement caused by each contribution.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Data is open source. The code will be made available at the time of publication, and authors have made a anonymous github link available. Note that no code is available now.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Most of the comments are already expressed above in the section highlighting the limitations of the work. I add below a couple of minor comments that the author might also like to address.

    Minor comments:

    • Training on sequential data streams is not necessarily privacy preserving. Differentially private mechanisms would be needed to demonstrate this. Also, streaming data often poses more of a privacy risk (data-not-at-rest) than centralising the data at rest.
    • Authors fail to recognise the existence of Federated learning as an approach. While Continual Learning is an important area of research, the justification for CL vs FL in a medical setting is not provided or asserted.
    • Would be interesting to see the behaviour of the method when the task is actually changing between sites, eg. if the model starts learning to segment prostates then learns to segment livers, does it still remember how to segment prostates?
    • Would be good to see a comparison to the model performance if all the data was co-localised as the optimal performance. In this setup it is hard to know if the demonstrated performance is good or not.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    There are some flaws with the lack of reference to FL and subsequent comparison, and some of the motivations are ill-founded, but it is technically a good paper where merits slightly weigh over weakness.

  • Number of papers in your stack

    6

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper proposes a continual learning + domain generalization framework for a series of multi-site prostate segmentation datasets. A dual-meta algorithm aligns the gradients between the previous and new sites, and also between the train and test sites (for the given sites). A seven-site dataset on prostate segmentation was used for evaluations.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is well-written
    • It tackles an important problem of continual learning + domain generalization which is a relatively new task.
    • The performances are promising, showing improvements on new unseen sites while minimizing the performance degradations on the old sites.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The paper seems to assume that all DG methods consist of pseud-train + pseudo-test splits, which is not true. The gradient alignment is also one of many DG methods. The authors need to clarify that the proposed method has chosen a particular DG method to solve the DG problem. Also, this makes the “relationship with CL and DG methods” argument a little weak. It would be “relationship with CL and a meta-learning DG method”.
    • There seems to be a little connection between the DG solution and the CL solution. The CL solution may be combined with other DG methods.
    • In L_SGA, are the losses for the first and second meta-objectives of equal ratios?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Reasonably reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The paper is overall solid, with a little of concern on its novelty (little connection between the DG and CL solution, seems like they are just independent methods combined to solve their own problems). Please see my weaknesses sections for detailed concerns.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper still has values and is well-written overall. The authors try to make a connection between the CL and DG methods, but given the proposed method handles a specific family of DG methods (meta-learning with pseudo-train/test splits), the argument sounds a little weak.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors in their study delivered a learning technique (SMG-Learning) which tries to deliver memorability and generalizability of a network in domain shifted datasets. They utilised a Synchronous Gradient Alignment (SGA) objective, a Dual-Meta algorithm to deliver the SGA objective without expensive computation overhead, and a replay buffer to verify the efficient rehearsal and to reduce redundancy.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors by using the SGA learning method (objective, Dual-Meta algorithm and replay buffer) deliver in the same time memorability and generalizability of the network in unseen datasets. Their method outperformed existing state of the art learning approaches (Continual Learning and Domain Generalization) and the baseline fine tuning technique.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The manuscript has no important weakness.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    This study is easily reproducable.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Very nice work. Well written and organised with a novel approach. Some suggestions of further work can be: i) the application of the idea in further internal and external datasets to capture the behaviour of the learning approach in different domain shift effects of external cohorts. ii) test the learning approach in different segmentation networks to capture the variation of the results compared with SOTA learning approaches (CL, DG etc).

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors developed a well organised and clear written study. They organise a well case study with an appropriate way of training, validation and testing hypotheses and datasets involved. The authors compare their learning method with state of the art approaches. Their method outperforms the existing learning approaches. This justified a well organised and delivered study.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The authors propose a new Synchronous Gradient Alignment objective and associated dual meta objective. The paper also provides some technical and heuristic details on replay buffers in order to reduce redundancy and improve model generalisability. The reviewers agreed that this is an interesting work for the MICCAI audience and well presented.

    In the rebuttal, please address the comments of R1 on FL and comparison experiments with the case when all data were co-localized.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6




Author Feedback

We sincerely thank all reviewers and area-chair for their positive and constructive comments. They acknowledged that our method is “technically a good paper” (R1), “overall solid” (R2) and “well-written and organized with a novel approach” (R3). Here, we give response to the main comments:

  1. Justification for Federated Learning (FL) (R1) We realize that federated learning (FL) is a practical privacy-preserving approach that allows training a global model (on central server) from multiple decentralized datasets (on local clients). However, the scenario of FL is highly different from our continual learning (CL) setting, despite their similar usage to reduce privacy leakage. Specifically, FL operates with the static local data, which requires each local client to be always available and reliable for communicating with central server. It transfers only model parameters between static local and global clients to protect user privacy, yet requiring a high cost to maintain consistent connections of all local clients especially for clinical sites. Differently from FL, our CL performs on the dynamic local data, where the central server is trained with only the data from new local client without access to the older ones. It aims to improve the model memorizability and relax the constraint of data co-localization, yet would pose other privacy risks (i.e., data-not-at-rest as pointed out by R1). Therefore, FL and CL are complementary for privacy preservation in the real-world applications. In this work, we only discuss the CL setting on the sequential data stream, while it would be of interest to integrate with FL into a so-called federated continual learning framework [1].

  2. Comparison to the FedAvg method with data co-localized (R1) We implement the FedAvg [2], a classical FL method that co-localizes data of all sites, to provide the upper performance bound. It achieves an average DSC of 90.75 % on sites from A to E, and 89.78% DSC on site F. Learning on a data stream would inevitably weaken the network memorizability and generalizability, reducing the performance on previous sites from A to E and unseen site F. For example, baseline JM method has 73.98% DSC (-16.77%) on previous sites and 83.85% DSC (-5.93%) on unseen site. Our method gains the best result among all CL and DG methods and achieves the minimal performance drop compared with FedAvg, i.e., 83.80% DSC (-6.95%) on previous sites and 87.18% DSC (-2.60%) on unseen site. Note that such slight performance drop of our method is relatively acceptable for its improved memorizability and generalizability, compared to FedAvg with the expensive cost of data co-localization for all sites.

  3. Clarification of the novelty and connection between CL and DG methods (R2) We summarize our major novelty in two-folds: 1) a specific formulation of SMG-Learning problem, and 2) a well-designed solution that combines two popular CL and DG methods building upon the gradient-based meta-learning. We would like to clarify that our combination solution is promising yet non-unique towards SMG-Learning, where other solutions may exist by combining other CL (e.g., regularization-based) and DG methods (e.g., augmentation-based). Traversing all possible combinations is beyond the scope of our work and can be left to the further work. In this work, we focus on finding a potential connection between two particular CL and DG methods (meta-learning based), which are formulated under a unified setting of gradient alignment yet with different alignment orientations. More explanations for their connection can be found in Sect. 2.1. Therefore, the proposed synchronous alignment strategy enables a reasonable combination between these particular CL and DG schemes.

[1] Yoon J et al. Federated continual learning with weighted inter-client transfer, ICML 2021 [2] McMahan B et al. Communication-efficient learning of deep networks from decentralized data, AISTATS 2017



back to top