Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Sutanu Bera, Vinay Ummadi, Debashis Sen, Subhamoy Mandal, Prabir Kumar Biswas

Abstract

Medical image segmentation is critical for accurate diagnosis, treatment planning and disease monitoring. Existing deep learning based segmentation models can suffer from catastrophic forgetting, especially when faced with varying patient populations and imaging protocols. Continual learning(CL) addresses this challenge by enabling the model to learn continuously from a stream of incoming data without the need to retrain from scratch. In this work, we propose a continual learning based approach for medical image segmentation using a novel memory replay-based learning scheme. The approach uses a simple and effective algorithm for image selection to create the memory bank by ranking and selecting images based on their contribution to the learning process. We evaluate our proposed algorithm on three different problems and compare it with several baselines, showing significant improvements in performance. Our study highlights the potential of continual learning-based algorithms for medical image segmentation and underscores the importance of efficient sample selection in creating memory banks.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43901-8_49

SharedIt: https://rdcu.be/dnwDX

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors of this paper address the challenge of Continual Medical Image Segmentation. They introduce a new learning approach that utilizes memory replay, which involves selecting images for the memory bank based on their contribution to the learning process. The algorithm used for image selection is both straightforward and effective. Extensive experiments and ablation studies can validate the effectiveness of the proposed method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The authors make an attempt to address a new and interesting Continual Medical Image Segmentation task.

    • The proposed approach is technically sound. To strengthen the Memory Replay-based continual learning approach for medical image segmentation, a simple and effective image selection strategy is developed.

    • Extensive experiments and ablation studies can demonstrate the effectiveness of the proposed approach.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • It is evident that having a larger memory size generally contributes to better segmentation performance. However, it is important to understand how the memory size impacts model performance. The authors should conduct additional ablation studies to validate this.

    • Any failure cases for PCR and GBR-based image selection mechanisms?

    • I’m wondering if the performance of the model would be affected by the order of the training datasets.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors have shared implementation details, but reproducing their work without their source code would be difficult.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Please see the comments in Weaknesses.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors tackle a new and interesting task and the proposed method is technically sound.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper
    1. It proposes a novel way to select replay samples in continual medical image segmentation task. Two score is designed to select more useful samples for replaying. The first score is higher for the samples with higher proportion of old class pixels, so that more old class pixels can be kept to deal with the class imbalance in replay (new class pixels are often more than old class pixels). The second score is higher when a sample has higher accumulated gradient during training, indicating it is a more difficult one than others. The authors combined these two scores to select replay samples. Moreover, to further increase the proportion of old class pixels, part of the replay samples are cropped to exclude some background regions.
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper focues on continual medical image segmentation, aims at making the network to continuous learning knowledge, other than all knowledge should be trained at the same time. This continual learning setting is close to real-world needs.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    In the comparision, all three baselines selected are continual image classification methods, none of which are specifically designed for continual medical image segmentation or continual semantic segmentation, making the comparision not so convicing.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The reproducibility is good, readers can basically reproduce the work by reading the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Two things can be added to make this paper better: 1) The use of using half of the replay samples for PCR and the other half for GBR. The most intuitive approach would be to average the scores of the two parts and sort them. As a result, explaining why you use this half-half strategy would be better. 2) As I mentioned, add more compared baselines for continual medical image segmentation or continual semantic segmentation tasks.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The task of continual medical image segmentation is useful in real-world applications, and the proposed score ranking strategy is innovative and reasonable.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors present a novel framework to continually trained a deep learning network as new source of data become available. The novelty relies on a sampling method based on sample ranking during training to store “memories” of previous tasks and cropping those “memories” to save storage. To evaluate the proposal 3 experimental setups are proposed: two of them with “incremental domains” (same task, different datasets) and one that focuses on a task incremental problem (new datasets and problems per task). The results show improvement over other well-known continual learning approaches.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) As the authors point out, continual learning is rarely explored in medical imaging. There are some recent works focusing on some applications, but in general, this is an overlooked field.

    2) While not stated by the authors, sampling / ranking strategies could be used for more than continual learning. The paper presented here is essentially a sampling strategy to determine “difficult examples” during the training process. Determining which examples are “easy” and “hard” is an open question and solution proposed here is cost efficient and simple.

    3) The authors present an exhaustive set of results with 3 different experiments focusing on multiple datasets.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) There is no clear motivation or justification for the use of continual learning. Yes, deep learning networks are susceptible to catastrophic forgetting, but why would it be necessary to use a sequential stream of data instead of using the whole data? This is the basis for the paper, so a lack of motivation only hurts the paper. 1.1) “Where imaging protocols and patient populations can vary significantly over time”. How? Is there any way to back up that claim? This seems dependant on the problem, centre and population. This once again links to the lack of a clear motivation for continual learning as opposed to “joint” learning.

    2) The literature in continual learning is lacking. This is a rapidly-growing field with newer pre-print almost every month which comprises several image-based literature reviews. Some of the classical methods (GDumb or gradient episodic memory) are missing, together with more modern sampling strategies (Gradient-based sample selection [GSS]). In fact, one could argue that most current methods heavily rely on replay strategies as opposed to the “competition” presented on the manuscript. This paints the proposal in a different light.

    3) Not necessarily a weakness but a limitation. The method addresses binary segmentation (background vs foreground region) and is dependent on it. The PCR score (cr) is heavily reliant on it. It can easily be extended, but this is not addressed in the paper.

    4) Nomenclature and writing could be improved in the methodology. Notation becomes obfuscated by the abuse of subindices and a large number of symbols to keep track of. In that sense, PCR could be easily defined as the ratio of positive voxels instead of the long and confusing paragraph that starts section 2.1.

    5) The naming of the scores used to rank the samples is misleading and confusing. Are they scores or ranking? According to equations 1 and 2 one would think they are actually scores.

    6) I have a couple of concerns regarding section 2.3: 6.1) What happens if the set of samples selected with PCR overlaps with the set from GBR? Would the samples be repeated? Why not combine both rankings? 6.2) The way the cropping is done is a bit counterintuitive. The PCR score is dependent on the “image dimensions”. Why not crop everything and then compute the PCR score? Furthermore, how are the negative samples cropped (no region of interest present)?

    7) I am also concerned about the hyperparameters: 7.1) There seems to be a large number of tunable parameters. 7.2) There are a lot of arbitrary decisions. Why use SGD only for the “naive” (sequential) version? Adam is usually avoided in continual learning for a reason and having different optimisers invalidates comparison. 7.3) Why are the buffer sizes different per method? 7.4) Why are the crop shapes different? 7.5) What is the rationale behind the task incremental experiment?

    8) The discussion is non-existent. The results section only lists the numbers without any insights.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    All the datasets used on the paper are public and clearly documented and referenced.

    While no mention is made on the paper the authors said they would make the code publicly available upon acceptance. Even without the code, the methods are clearly explained and it should be easy to reproduce the results.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    1) First of all, I think the motivation needs to be better justified. This is my biggest concern with the paper. As someone who has done some experiments with continual learning and medical imaging, using such a strategy is usually hard to justify to other researchers and stakeholders. Unless the results are clearly better by a long margin, there is no justification for not using all the training data available for the problem. Current in large language models points to that direction. Why limit the model’s performance by enforcing an unnecessary artificial requirement in model training?

    2) Keeping the motivation aside, my next biggest concern is the lack of acknowledgement of the state-of-the-art literature on continual learning (CL). This is truly a big field, which in my opinion, will keep on growing. There are multiple methods that go from simple sampling strategies (GDumb), to complicated sampling mechanisms (GSS based on the gradients of the sample) and complicated constraint based updates (GEM). In general, any new CL method relies heavily on their coreset / memory buffer (name varies per method). The method needs to be better contextualised within that literature, especially when GDumb and the simplest version of GSS (the greedy sampler) are direct competition with the proposed method.

    3) Regarding the ranking systems, while not necessary for rebuttal (if it comes to that), I strongly would advise the authors to find a better way to combine the two scores. Furthermore, how can the model deal with the fact that gradient-based scores between “tasks” might not be scaled properly? How is the GBR score normalised? I would also advise the authors to polish and refine the experimental setup. Unify the training regime (same buffer size, same cropping strategy and same optimiser). Otherwise the comparison become rather meaningless.

    4) Related to my first point on better motivation, I would advise to choose a better set of experiments (closer to real practise). What is the rationale behind the task-incremental approach? I cannot help but feel that this is a toy example. It serves the purpose of illustrating the results but it does not match any real clinical problem and it might, as any toy example, be irrelevant.

    5) How is the buffer used? As the literature shows, there are multiple ways to use the set of memories: either they are mixed with the new task examples, they are used for training after all the batches for the new tasks are seen, only the memory buffer is used for training (new samples go directly to the bank), the gradients of the samples can be used as a constraint on the weight updates, a distillation loss can be used on the logits of the samples… I assume from the text that the examples are include as examples for each epoch update but this should be clarified as the figure and text are ambiguous in that sense.

    6) What follows is just a list of smaller things to be addressed or corrected: 6.1) Introduction, page1: “Furthermore, a segmentation model trained on MRI […] may not generalize well to another population […]”. While true, this is not necessarily related to the proposal. There is a tenuous link between generalisation, continual learning and making sure that all “tasks” are “remembered”. The link is not explicitly made. I would avoid talking about it. 6.2) Introduction, page1-2: “Continual learning algorithms have garnered significant interest laterly […]”. This is a great place to drop a few references and show knowledge on the state of the art. 6.3) Section 2.1: “y_{k, i}{h, w, d} \in [0, 1]”. This is both an example of obfuscated notation (two separate levels of subindices) and a typo. The [ ] keys are missing (common LaTex mistake). 6.4) Section 2.2: “A sample D{k, i}”. I believe that using the pair “(x_{k, i}, y_{k, i})” would be better. D is used for a few different things with and without the extra second subindex and can lead to confusion. 6.5) Section 3.1: For the task incremental scenario, is the task known? I assume the answer is yes, otherwise it could be considered a class incremental scenario. However, the reason I am asking is because I would be curious to know if training independent segmentation models is better than training that task incremental approach. I would suspect that trying to learn quite different tasks with different types of images would prove more of a hindrance. 6.6) Section 3.3, start of page 6: “to train a Residual Unet”. This is a small nitpick, but why not call it residual Unet from the start? It makes it seem inconsitent. 6.7) Section 3.3: Why is fbr set to 0.7 only for the incremental task? How was the value chosen? 6.8) Section 4: “Atypical Sample Selection”. The name does not come up before. Would it not be better to just focus the paper around that idea, drop the name on the methodology when explaining the sampling mechanism and then have it on that section as a callback? 6.9) Section 4: This is another major nitpick but “average accuracy” refers to the average accuracy of all tasks after the last task is learned. As obvious as that sentence sounds, my point is that the results presented focus on the Dice similarity coefficient. Final average DSC is probably a better name for it. 6.10) Any insights on why joint performs better only for the first experiment? I find that odd, although it makes some sense on the task incremental scenario.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This is a hard decision. On one hand I believe there is good work and a set of exhaustive experiments with multiple experiments. On the other hand there are clear shortcomings and there is no clear rationale for the proposal. If that motivation was addressed in a satisfactory manner and if there were less arbitrary choice and a more systematic approach to the experiments I would be open to consider acceptance. As it is, the weaknesses weigh over the merits.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    I am happy with the feedback provided by the authors and their willingness to better motivate the paper (I understand that the rebuttal and original submission were more restrictive than a camera-ready version). While I still that some weaknesses will still be there, I feel that the comments from the rebuttal make the merits weigh over them.



Review #4

  • Please describe the contribution of the paper

    The paper presented a contunual learning method for image segmentation by constructing a memory bank for selecting new samples.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The presented memory replay-based continual learning paradigm is simple, which can continuously learn from a stream of incoming data without re-training from scratch.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The experiments are build a weak UNet. Thus, the improvements are not convincing. Please use the well-known nnUNet as the baseline.
      https://github.com/MIC-DKFZ/nnUNet “Note that although we perform the experiments using UNet network, any sophisticated network like VNet, or DeepMedic can also be trained continually using our method.” This is true but such claim is weak without solid experiments.

    2. The selected targets are very easy to segment, e.g., spleen, prostate. Please use challenging datasets to validate the method, e.g., tumor datasets.

    3. Many claims are not rigorous. For example, “a network usually learns more from examples that are challenging to segment.” This claim is not rigorous. Please add references. Networks have large loss for hard samples at the beginning of the training but this doesn’t mean that network learn more from these samples.

    “we suggest the cropping of the images around the organ of interest in order to minimize the size of the memory bank. “ In real-world setting, it’s is non-trivial to automatically crop the targets without relying other tools (e.g., pre-trained model).

    1. It seems that the author change the LNCS template to obtain additional spaces. For example, Table 1-3. The default font size should be larger.

    2. The CT images are not presented in proper window size and level, indicating that the authors may not have basic medical image visualization knowledges. Please check this page for the introduction. https://radiopaedia.org/articles/windowing-ct#:~:text=Window%20level%2Fcenter,be%20brighter%20and%20vice%20versa

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors provide detailed algorithm pipeline. Thus, the reproducibility is good.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Please use the well-known nnUNet as the backbone and validate the method on unsolved and challenging segmentation tasks, such as tumor segmentation.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The selected backbone netowrk is weak and the target organs have been well soveld. The authors may argue that they used continuous setting but it is also important to select challenging segment targets. Here are related references. https://github.com/MIC-DKFZ/nnUNet https://decathlon-10.grand-challenge.org/ https://amos22.grand-challenge.org/

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    2

  • [Post rebuttal] Please justify your decision

    Thanks for the rebuttal.

    1. The authors claim that they reduce the memory cost by cropping the image. However, this is not the common setting if you look at the whole spectrum of MICCAI/TMI/MedIA papers. The algorithms should directly handle real-world data rather than adding manual intervention. For example, if you crop the data during training phase, you also need to crop them during inference or clinical deployment. This requires human intervention. Additionaly, it’s a better choice to consider reducing the memory cost from algorithm design point rather than cropping the data.

    2. The authors claim that nnUNet is resource-intensive. However, this is not true. nnUNet also works on typical settings (e.g. 11G GPUs) and even low-resouce setting if you specify the allowed GPU memory during planning. As a proof-of-concept purpose, using the open-sourced and user-friendly is a better choice rather than using a very weak baseline.

    3. I didn’t find the response of many other concerns, e.g., some loose claims in the paper.

    Since the rebuttal doesn’t address my main concerns and but shows that the authors have mis-understanding of the well-known baseline method (e.g., nnUNet), I reduce my score.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This work tackles a current and interesting problem with real-world applicability. The evaluation is strong with ablation and multiple datasets. The impact of the memory size on performance should be discussed and presented along with detailed evaluation of the presented experimental results. It would also be useful to compare the results with more recent approaches and explain the experimental design in the more detail. The motivation should be discussed in some more detail along with a more detailed review of the literature. The potential extension of the work to non-binary segmentation and more challenging segmentation tasks should also be discussed.




Author Feedback

We thank all reviewers for their constructive comments and appreciation of our novel and simple yet effective continual learning (CL) framework. We will implement the suggested necessary revisions to strengthen our paper. Reply to R1: We agree with the reviewer that including more samples will improve performance. However, our experiments demonstrate that we achieve good results even with only three samples (See Table 1). Further, the cropping strategy minimises memory usage, enabling optimal performance with a small buffer. Reply to R2: Existing CL frameworks are mainly designed for classification. So, we could only provide a few baselines for comparison. But, we added different types of CL frameworks to show the efficacy of our framework for medical image segmentation. Reply to R3: We focus the response on clarifying the rationale behind our work and justify various choices, including the experiments. W1&2: CL can offer significant advantages in medical image analysis. Joint training may not be always possible due to the unavailability of training data from the beginning. Training networks from scratch every time becomes hugely resource-intensive. CL mitigates these issues by updating models with minimal resources, preserving prior knowledge. Space limitations prevented us from elaborating on the motivation of the CL framework. We recognise this limitation, and we will add this and a necessary literature review in the main manuscript using the additional space, adhering to MICCAI guidelines. W3: Extending the framework for non-binary segmentation is simple, as you mention. No changes are needed for GBR. In PCR, all objects of interest can be treated as positive. W4&5: We understand your concern and will improve the camera-ready version’s writing. We call it rank as we use the scores for ranking. However, we agree that the appropriate name should be score instead of rank. W6: In the current framework, If two samples overlap, we discard one of them and select the next ranked sample. The absolute value of PCR, a ratio-based score, varies when calculated after cropping. However, the relative ranking using PCR remains consistent. To simplify the framework, we sample images using both scores and crop them. Combining both scores might be beneficial, and we’ll include this suggestion in our extended work. W7: Yes, Adam optimiser is avoided in some CL works(but not all, Ref [5],[20],[17]). In our case, both Adam and SGD led to similar performance when trained for enough epochs; In some experiments, slight underperformance is observed with SGD. In the last experiment, because of the complexity of the task incremental CL problem, a larger buffer size is required to provide enough memory regularisation; otherwise, in the first two experiments, the same buffer size is used. The crop dimensions are mostly relative to the full image dimensions and segmentation target size. Rationale behind the task incremental learning experiment: Using a single model for segmenting multiple organs can be beneficial. Factors like limited annotation may not allow joint training.Task/class incremental learning is valuable in such cases. New organs can be added as segmentation targets without training from scratch. We included an experimental section to explore this concept. We recognise the need to convey our thoughts in a better way in the discussion, which we will provide in the camera-ready version. Reply to R4: W1: Our method prioritises training strategy and sample selection and its effect on the replay CL framework. We opted for a widely accepted, computationally efficient backbone instead of a resource-intensive option like nnU-Net. This allowed us to prove our concept effectively without heavy computational resources. W2: Our key contribution is not centered around developing an advanced segmentation model for challenging tasks but rather on presenting a CL framework that maintains performance without degradation. W3: See Ref [2]




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    While some weaknesses were identified with this paper, the authors have made efforts to address the issues raised. As long as these issues will be fixed in the paper according to the rebutal, I would accept this paper.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper presents a novel data selection scheme for the development of a memory bank tailored for continual learning. The authors propose two ranking schemes and thoroughly evaluate their performance, comparing them to some existing methods. While the reviewers raised concerns regarding the influence of memory bank size and the comparison with more recent approaches, the authors have effectively addressed these issues in the rebuttal. However, the reviews remain divided between strong rejection and acceptance, with one reviewer expressing dissatisfaction with the rebuttal. Taking into account the comprehensive assessments from all four reviewers and recognizing the potential impact of this work, I believe that the paper possesses significant merits, and thus, I recommend accepting it.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    In this paper, the authors proposed a continual learning method on segmentation task using gradient and positive class based ranking. The paper is well written and intuitive. The algorithm should be straightforward to be implemented. CL is an important task. The author explained the motivation very well. For using cropping patch in the memory bank, it is not an issue since they are the inputs to the network and are sufficient/better for CL than whole volume. Also, the algorithm should be easy to be extended to other networks. It is recommended to evaluate it on more advanced network in the journal version (if there is a plan) but it is not critical for making the claims in this paper. Finally, please also update the table formats in the final version if there is extra space.



back to top