Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yixiao Zhang, Xinyi Li, Huimiao Chen, Alan L. Yuille, Yaoyao Liu, Zongwei Zhou

Abstract

The ability to dynamically extend a model to new data and classes is critical for multiple organ and tumor segmentation. However, due to privacy regulations, accessing previous data and annotations can be problematic in the medical domain. This poses a significant barrier to preserving the high segmentation accuracy of the old classes when learning from new classes because of the catastrophic forgetting problem. In this paper, we first empirically demonstrate that simply using high-quality pseudo labels can fairly mitigate this problem in the setting of organ segmentation. Furthermore, we put forward an innovative architecture designed specifically for continuous organ and tumor segmentation, which incurs minimal computational overhead. Our proposed design involves replacing the conventional output layer with a suite of lightweight, class-specific heads, thereby offering the flexibility to accommodate newly emerging classes. These heads enable independent predictions for newly introduced and previously learned classes, effectively minimizing the impact of new classes on old ones during the course of continual learning. We further propose incorporating Contrastive Language–Image Pretraining (CLIP) embeddings into the organspecific heads. These embeddings encapsulate the semantic information of each class, informed by extensive image-text co-training. The proposed method is evaluated on both in-house and public abdominal CT datasets under organ and tumor segmentation tasks. Empirical results suggest that the proposed design improves the segmentation performance of a baseline model on newly-introduced and previously-learned classes along the learning trajectory.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43895-0_4

SharedIt: https://rdcu.be/dnwxM

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #3

  • Please describe the contribution of the paper

    This paper propose a method for continual multiple organ and tumor segmentation in 3D abdominal CT images. It first empirically verified the effectiveness of pseudo-labeling in retaining previous knowledge. Then, this paper propose a new network design that uses organ-specific heads for segmentation, which allows easy extension to new classes and brings little computational cost in the meantime. The segmentation heads are further strengthened by utilizing the CLIP text embeddings that encode semantics of organ or tumor classes.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The method outperforms the continual learning baseline methods in the challenging multiple organ and tumor segmentation tasks.
    2. The topic raised in this paper is interesting and will have widely impact in medical applications.
    3. The paper is well written and organized.
    4. The framework related figures and plots are clear and easy for understanding. This figures help a lot for understanding the paper proposed framework.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. It will be better to add sensitivity experiment in evaluation part.
    2. There are more continue learning, meta learning related work in this area. For example:“Co-representation learning framework for the open-set data classification”, “SIM: Open-world multi-task stream classifier with integral similarity metrics”, “CML: A contrastive meta learning method to estimate human label confidence scores and reduce data collection cost”, “Adaptive margin based deep adversarial metric learning”. These papers should also be discussed in related work.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    It can reproduce, the author provide data and code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. The method outperforms the continual learning baseline methods in the challenging multiple organ and tumor segmentation tasks.
    2. The topic raised in this paper is interesting and will have widely impact in medical applications.
    3. The paper is well written and organized.
    4. The framework related figures and plots are clear and easy for understanding. This figures help a lot for understanding the paper proposed framework.

    5. It will be better to add sensitivity experiment in evaluation part.
    6. There are more continue learning, meta learning related work in this area. For example:“Co-representation learning framework for the open-set data classification”, “SIM: Open-world multi-task stream classifier with integral similarity metrics”, “CML: A contrastive meta learning method to estimate human label confidence scores and reduce data collection cost”, “Adaptive margin based deep adversarial metric learning ”. These papers should also be discussed in related work.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. The method outperforms the continual learning baseline methods in the challenging multiple organ and tumor segmentation tasks.
    2. The topic raised in this paper is interesting and will have widely impact in medical applications.
    3. The paper is well written and organized.
    4. The framework related figures and plots are clear and easy for understanding. This figures help a lot for understanding the paper proposed framework.
  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper focuses on the continual segmentation of abdominal multi-organ and tumors. First, the paper leverages the output prediction from the previous learning step as the pseudo label for the current step’s old classes. The paper also introduces CLIP embeddings to organ-specific heads.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper tackles an interesting problem, that is, continual segmentation of medical images, with novel network architecture and organ-specific heads. Detailed analysis of computational complexity, which remains constant in continual learning for segmentation.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The improvement is marginal in some cases. For example, in Tab. 1, on JHH_organ (7) dataset, the proposed method is only slightly better than other baseline methods, and even worse in the first step.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The source code is available online.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    There are some language and formatting issues in the paper, such as “[12,13] extended the distillation loss…” on page 2.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method is novel, with decent improvement on multiple datasets.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #1

  • Please describe the contribution of the paper

    The authors have proposed a strategy to handle catastrophic forgetting in a continual learning paradigm for medical image segmentations, particulatly abdominal multi-organ and tumor segmentation. They do so by first implementing a pseudo-label based learning of the previously learned classes to validate effectiveness of the soft pseudo-labels, further incorporating image-aware segmentation heads for each class on top of the encoder and passing it to a decoder. The encoder and decoder are shared. Next, they incorporated a image-text co-training model CLIP to further assist the decoder align features for a given class. Their main contributions include a novel strategy to handle continual learning in this paradigm with little computational overhead.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The strengths of the paper are the following:

    1. Pseudo-rehersal based machanisms have shown improvement in the case of continual learning, and the authors have incorporated that into that work to strengthen their approach.
    2. The proposed approach is outperforming the compared relevant SOTA approaches.
    3. Their proposed approach is computationally less heavy compared to their baseline approach and other approaches that use additional decoder networks for each new class/domain, and I would consider this their main strength.
    4. They have performed an ablation study to showcase the effectiveness of the different strategies they incorporated.
    5. The method is completely reproducible.
    6. The paper is relatively easy to follow and enjoyable to read.
    7. This work is relatively easy to incorporate into clinical applications for identifying tumors and organs.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The weaknesses of the paper are the following:

    1. The idea of using pseudo labels for continual learning or incremental learning has been utilized in previous works that involve segmentation, i.e., [1, 2]. This basically affects the novelty of their step 1 in addition to the significance of the experiments they performed using the pseudo labels.
    2. The main contribution involves using additional MLP heads for each new class/domain that is being added, and it resulted in it being computationally less expensive, it would have been interesting to observe the comparison of their complexity with other SOTA approaches other than the baseline approach they used in the paper SwimUNETR. For example, [1,3].
    3. Minor: typos, Baselines and Metrics section, line 5

    References:

    1. Douillard, A., Chen, Y., Dapogny, A. and Cord, M., 2021. Plop: Learning without forgetting for continual semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4040-4050).
    2. Cha, S., Yoo, Y. and Moon, T., 2021. SSUL: Semantic segmentation with unknown label for exemplar-based class-incremental learning. Advances in neural information processing systems, 34, pp.10919-10930.
    3. Michieli, U., Zanuttigh, P.: Incremental learning techniques for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision workshops. pp. 0–0 (2019)
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The approach is reproducible in a lab setting with resources available. On top of that, they have used a couple of publicly available datasets alongside publishing their code. Therefore, it would be a good contribution in the medical science and is relatively easy to incorporate even with clinical applications to identify and segment brain tumors.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    ● The paper is written nicely and is easy to follow. ● Very simplistic execution yet effective in this area. ● Having more experiments and comparisons with more relevant approaches for accuracy as well as computational complexity would make the contribution even stronger. ● Experimenting with task specific learning based on individual classes would further solidify the effectiveness of the approach rather than categorizing the dataset into their relevant groups.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors have approached the problem of continual learning with a very simple yet effective method. Even though the paper lacked extensive experiments and complexity analysis with other works, the integration of CLIP with pseudo labels have improved the performance over one of the SOTA methods for tumor segmentation, SwimUNETR. With further analysis on this details, this would be a good contribution to the medical field.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The reviewers have acknowledged the work’s strengths but have identified some shortcomings. Specifically, reviewer 1 and reviewer 3 expressed concern about lacking comparisons and discussions with mentioned SOTA approaches, especially for the complexity analysis. Reviewer 2 commented on the improvement is marginal in some cases.

    Besides, since most Reviewers have lower confidence, I read this paper carefully and have some comments: (a) The crucial aspect that needs attention in this paper is the experimental part. The steps of continual learning presented in the experiments are limited, which makes it difficult to ascertain the effectiveness of the proposed method when applied to subsequent datasets. This limitation casts doubt on the scalability and adaptability of the method. (b) The ablation studies provide evidence of performance improvement when using CLIP embeddings. However, the same studies also reveal a potential drawback. If CLIP embeddings are not used, the method’s performance on new datasets might even deteriorate, possibly underperforming compared to ILT. (c) Additionally, the usage of CLIP embeddings in the method seems to be oversimplified. While it is stated that CLIP embeddings contribute to enhancing performance, a clear and detailed explanation or discussion as to why and how they contribute to performance gains is missing. This leaves a gap in understanding the specific role and benefit of incorporating CLIP embeddings in the method. Without this understanding, it becomes challenging to determine how integral the CLIP embeddings are to the method and how their use could be optimized.

    In light of these comments, I kindly ask the author to address these issues and all reviewers’ concerns in the Final Version.




Author Feedback

Thank reviewer 1 for listing 7 strengths of our work and the weak accept rating. For the concerns: (1) The idea of pseudo label has been used in previous works: yes, and we want to explain that we are not claiming pseudo label as our novelty. Nevertheless, we observe that good pseudo labels can deliver fair performance in keeping old knowledge in the abdominal organ segmentation task, and build our method based on this observation. (2) Compare the complexity with more SOTA models: since both ILT and PLOP requires feature distillation, they need to do inference with an old model while training an updated model, thus has 2x659.4 GFLOPs and are computationally less efficient than the proposed method. (3) Thanks for pointing out the typo and we will fix it in the camera-ready version.

Thank reviewer 2 for the praises and the accept rating. For the concerns (1) The improvement is marginal in some cases: on the JHH_organ dataset, the targets are easy to segment, therefore the pseudo labels are of high quality and all compared methods benefit from it. In this case, the advantage of the proposed method is more reflected in the adaptation to new difficult targets like gastrointestinal tract and cardiovascular system. (2) Some language and formatting issues: thanks for pointing out and we will correct them in the camera-ready version.

Thank reviewer 3 for the praises and the accept rating. For the concerns (1) Add sensitivity evaluation and (2) More related work: we will add the sensitivity evaluation and the mentioned related works in the camera-ready version.

Thank meta reviewer for the early accept decision. Here are our responses to the concerns. (1) Limited continual learning steps in experiments: we conducted another experiment in which the model that first trained on BTCV then on LiTS is transferred to the JHH dataset. The segmentation targets are three organs in JHH but not in BTCV or LiTS: colon, intestine and celiac truck. The proposed method achieves average Dice 0.699 and 0.581 on the old and new targets, outperforms the best baseline with Dice 0.573 and 0.555. (2) Removing CLIP deteriorates the performance: if CLIP embeddings are not used, we are essentially only using pseudo labelling for distillation while other baseline methods also use feature distillation. We claim that CLIP embedding is a vital part of the proposed method. (3) Miss discussion about the contribution of CLIP embeddings: with vision-language contrastive learning, CLIP learns the semantic correlation between visual features and texts. It is observed by the community that in the CLIP embedding space, similar concepts are mapped close to each other. We also observed that when using CLIP embedding, the decoder features show better clustering effect. For example, features of “left kidney” and “right kidney” are close, and “liver” and “liver tumor” are close. This is not observed with one-hot embedding. We believe CLIP-based encoding can facilitate the model to capture the anatomical relationship. In addition, we believe the proposed method will benefit from stronger vision-language foundation models developed in the future. We will make it clear in the camera-ready version.



back to top