Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Mohammad Mozafari, Adeleh Bitarafan, Mohammad Farid Azampour, Azade Farshad, Mahdieh Soleymani Baghshah, Nassir Navab

Abstract

Few-shot segmentation (FSS) models have gained popularity in medical imaging analysis due to their ability to generalize well to unseen classes with only a small amount of annotated data. A key requirement for the success of FSS models is a diverse set of annotated classes as the base training tasks. This is a difficult condition to meet in the medical domain due to the lack of annotations, especially in volumetric images. To tackle this problem, self-supervised FSS methods for 3D images have been introduced. However, existing methods often ignore intra-volume information in 3D image segmentation, which can limit their performance. To address this issue, we propose a novel self-supervised volume-aware FSS framework for 3D medical images, termed VISA-FSS. In general, VISA-FSS aims to learn continuous shape changes that exist among consecutive slices within a volumetric image to improve the performance of 3D medical segmentation. To achieve this goal, we introduce a volume-aware task generation method that utilizes consecutive slices within a 3D image to construct more varied and realistic self-supervised FSS tasks during training. In addition, to provide pseudo-labels for consecutive slices, a novel strategy is proposed that propagates pseudo-labels of a slice to its adjacent slices using flow field vectors to preserve anatomical shape continuity. In the inference time, we then introduce a volumetric segmentation strategy to fully exploit the inter-slice information within volumetric images. Comprehensive experiments on two common medical benchmarks, including abdomen CT and MRI, demonstrate effectiveness of our model over state-of-the-art methods.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43895-0_11

SharedIt: https://rdcu.be/dnwxT

Link to the code repository

https://github.com/sharif-ml-lab/visa-fss

Link to the dataset(s)

https://www.synapse.org/#!Synapse:syn3193805/wiki/217789

https://chaos.grand-challenge.org/


Reviews

Review #4

  • Please describe the contribution of the paper

    This paper presents a method for few-shot segmentation (FSS) in 3D medical images. To utilize the shape information in consecutive slices, the method proposes a self-supervised FSS task based on the consecutive slices and proposes to propagate pseudo labels across adjacent slices. During testing, a volumetric segmentation strategy is proposed to utilize the inter-slice information. Experimental evaluations are conducted on two public medical datasets, outperforming previous FSS methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The motivation of each method component has been well presented.
    • The introduced self-supervised FSS task, 2.5D loss and the volumetric segmentation strategy work together to utilize the shape information in consecutive slices to improve FSS in 3D medical images.
    • The contribution of each method component has been clearly demonstrated.
    • The method shows superior performance on the two evaluation datasets.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The loss functions lack clarity. The formulation of each loss item is not clearly presented.
    • As shown in Table 2, the introduced self-supervised FSS task mainly improves performance on spleen, but not the other three organs.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The overall pipeline is complicated and some parts lack clarity, but not all the implementations details nor the code are provided.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • The formulation of each loss item needs to be clearly provided to make the paper self-contained.
    • The proposed method without using any manual annotation during training significantly outperforms methods that utilizes manual annotations. Some explanations on this observation need to be provided otherwise it is a bit confusing.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method is fairly novel with clear motivation. Experiments show the method obtains consistent improvements over previous methods and each proposed component contributes to the performance improvements.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    Authors propose a few-shot segmentation method for volumetric images based on self-supervised learning capable of exploring inter-slice similarities in order to better explore the third dimension of images such as MRIs and CT scans. VISA-FSS is based on inter-slice superpixel consistency and leverages both intra- and inter-volume data augmentations. The method is trained and evaluated in multiple public volumetric medical imaging datasets and in comparison to multiple strong baselines, showing consistent performance gains in comparison to the SOTA.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is very well-written, with only a few very small concerns from the reviewer.

    The proposed method is well motivated, novel and the methodology is sound and intuitive.

    Ablation and comparisons with the literature are well designed and presented in the manuscript. The proposed method yields compelling results in comparison to multiple very strong baselines in few-shot segmentation for medical imaging for multiple tasks.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    My only major weakness with this manuscript is regarding the replication of this research by other teams, which is detailed further in my review. Apart from this, the paper is very well written, motivated and executed.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    My only major concern with the manuscript is regarding reproducibility, as the neural network architecture and general framework proposed by the authors is composed of multiple non-trivial blocks. As no code is made available, a faithful reconstruction of the experimental procedure in this work would likely require a lot of hyperparameter tuning and coding expertise, hampering the replication of this research mainly for researchers with access to fewer computational resources (i.e. GPU).

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    “especially for those of volumetric [16]”

    • volumetric images*

    “Fig. 1. An overview of the proposed VISA-FSS framework during training, where m = 2. SPPS is a pseudo-label generation module for consecutive slices.”

    • The variable $m$ is presented in the caption before definition in the main text.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    My only major concern with the manuscript is regarding reproducibility. Thus, if the authors publicize code and pretrained models by the rebuttal phase, I will very strongly argue for the acceptance of this manuscript.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposed a novel self-supervised training and inference approach for few shot segmentation, based on SSL-ALPNet [18]. First, a realistic task, which propagates pseudo label to adjacent slices in 3D volumes using image registration. Second, a dice loss between adjacent slices to ensure label smoothness. Third, an inference strategy that takes adjacent slices as support-query pairs inside the same volume, instead of inter-volume support-query. Ablation studies demonstrate performance improvement per modification and the final result outperformed SSL-ALPNet as well as other self-supervised learning methods, on two data sets of different modalities.

    [18] https://arxiv.org/abs/2007.09886

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Using the connection between adjacent slices inside one volume, the author proposed three modifications for training data, training loss, and evaluation. Each component is well connected to others, as the additional “realistic task” provides additional training data. While the additional 2.5D loss ensures prediction consistency between adjacent slices. The proposed volumetric segmentation strategy, which task the model to first predict mask at the center of each group and then propagate to other slices within group, further leverages the auxiliary training tasks and losses. Overall, it is a well designed and comprehensive method.

    The experiments results are strong and complete, demonstrating the superiority of each proposed method. Authors also compared with RPS, which is a registration based strategy. In supplementary, the authors also studied the impact of m, which determines the range of adjacent slices.

    Finally, the paper is very well written and easy to understand.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    One potential limitation of the proposed method is its complexity, as the training requires image registration in addition. Also the author mentioned two stage training. The technical details on the used registration methods and how the two stage training methods are performed are not disclosed. It would be valuable if the author could discuss the compute cost of the proposed method compared to baselines.

    Otherwise, it would be more insightful if the authors could report other metrics such as surface dice and Hausdorff distances.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors answered no to all questions in the reproducibility checklist. This means that the author would not release any code. Although the authors claimed all hyper parameters and protocols are following [18], it would be difficult to reproduce entirely the results. Especially, the registration module is not thoroughly discussed. However, the idea is well explained, means it would not be difficult to test the similar idea in other applications.

    [18] https://arxiv.org/abs/2007.09886

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    As the main motivation of this paper is to leverage the information between continuous adjacent slices. 3D neural networks that ingests the volume directly is technically capable of addressing these issues. For a more complete comparison, it would be interesting if these 2D, 2.5D methods can be benchmarked against 3D methods.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well written and easy to understand. The motivation behind the propose method is clear. The modifications are simple and reasonable. The results are strong. However, some technical details about the training are not clear and code would not be released.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper proposed a novel self-supervised training and inference approach for few-shot segmentation. The paper received positive feedback from all reviewers who found it well-written and only had a few minor concerns. The proposed method is considered well-motivated, novel, and methodologically sound. Ablation and comparison studies were well-designed, yielding compelling results for medical image segmentation tasks. The modifications proposed for training data, training loss, and evaluation are well-connected to each other, forming a comprehensive and well-designed approach. Experiments showed superior performance on two evaluation datasets, and the paper is easy to understand.

    One reviewer suggests that the manuscript should provide more information about the used registration methods and the two-stage training, as well as report additional metrics such as surface dice and Hausdorff distances. Another point raised is the need to present the formulation of each loss item more clearly. Furthermore, the proposed method’s complexity is considered a potential limitation due to the requirement for image registration and two-stage training, and it would be beneficial to discuss its computational cost compared to baselines.




Author Feedback

We thank all reviewers for their constructive comments and suggestions. We appreciate that all reviewers recognize the novelty of our method, and find our paper well-written and well-designed.

All hyper-parameters, protocols, and implementations are following [18]. However, based on all reviewers’ suggestions, we have released the code for reproducing results at https://mega.nz/file/FbtVhITR#J0CB6d6BiPW48kaodHKHoWkk1Y0c9R64IiWFJ7mFAxM.

To generate realistic tasks, we employ the deformable registration method presented in [2] (R3, and MR). It estimates the deformation between two images by optimizing a similarity metric that measures the degree of correspondence between these two images. In our work, this registration network is trained unsupervised (further details can be found in [2]). Finally, the trained registration network enables us to compute the flow field among a pair of support and query images. Then, this computed flow field is applied to the superpixel-based pseudo-label of a support image to make the pseudo-label for a query image.

For the training of VISA-FSS, we adopt a two-stage approach (see Section 2.2). During the first stage, we train the few-shot segmenter on both types of synthetic and realistic tasks using the segmentation loss employed in [18] and the regularization loss defined in [24], which both are based on the standard cross-entropy loss (formulations is provided in the final version of our paper (R4, and MR)). Specifically, the segmentation loss is applied on a query image to predict its segmentation mask, while the regularization loss is applied on a support image to segment the same class in its corresponding support image. In the second-stage of training, we aim to leverage information beyond 2D image slices in volumetric images by employing realistic tasks. We make multi-query tasks comprising multiple adjacent slices, and fine-tune our model on these tasks. In this stage, we used a lower learning rate and a dice loss between queries (see Equation 1).

In the inference time, the computational cost of our proposed model is comparable to the baseline method, SSLALPNet [18] (R3, and MR). During the training time, our method involves two additional steps compared to SSLALPNet. Firstly, we train the registration network to generate realistic tasks. However, this registration network is trained independently before training VISA-FSS and its results are utilized in subsequent training of VISA-FSS. This means that there is no increase in the computation cost requirement during the training process of VISA-FSS itself. Secondly, we employ a second-stage training process to fine-tune the few-shot segmenter on multi-query tasks. However, the number of iterations of this training is limited to 10,000 which is about 90 percent less than the number of iterations in the first-stage training phase. Consequently, although training of VISA-FSS takes about 10 percent higher the compute cost compared to other methods due its second-stage training, it leads to notable improvements in the results.

To provide a deeper understanding of the VISA-FSS performance, we conducted an evaluation of our proposed method using Hausdorff and SurfaceDice metrics, as suggested by reviewers R3 and MR. Results demonstrate the effectiveness of our proposed approach compared to the SSLALPNet method. Specifically, we observed an average Hausdorff value of 24.40 for all organs, outperforming the average value of 28.39 obtained by SSLALPNet method. Additionally, VISA-FSS achieved an average SurfaceDice score of 90.59% for all organs, outperforming the average value of 89.34% achieved by SSLALPNet. However, these reported results are derived from a single fold of the CT dataset. Due to time constraints, we were unable to conduct evaluations across multiple folds. However, in the final version of our paper, we will present the comprehensive average results obtained from all folds, encompassing both datasets.



back to top