Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Runze Wang, Qin Zhou, Guoyan Zheng

Abstract

Despite the great progress made by deep convolutional neural networks (CNN) in medical image segmentation, they typically require a large amount of expert-level accurate, densely-annotated images for training and are difficult to generalize to unseen object categories. Few-shot learning has thus been proposed to address the challenges by learning to transfer knowledge from a few annotated support examples. In this paper, we propose a new prototype-based few-shot segmentation method. Unlike previous works, where query features are compared with the learned support prototypes to generate segmentation over the query images, we propose a self-reference regularization where we further compare support features with the learned support prototypes to generate segmentation over the support images. By this, we argue for that the learned support prototypes should be representative for each semantic class and meanwhile discriminative for different classes, not only for query images but also for support images. We additionally introduce contrastive learning to impose intra-class cohesion and inter-class separation between support and query features. Results from experiments conducted on two publicly available datasets demonstrated the superior performance of the proposed method over the state-of-the-art (SOTA).

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16440-8_49

SharedIt: https://rdcu.be/cVRwA

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    Authors propose a regularization method to improve prototype-based few-shot segmentation model for abdominal organs. Their contributions consist of self-reference and contrastive learning. Self-reference regularizes a class prototype to be representative of entire organ in support image. Contrastive learning helps learning similarity between foreground and background features.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main idea of this paper is clear and easy to understand. Authors properly incorporate contrastive learning techniques into existing few-shot segmentation method and successfully improve performance. And they show relevant quantitative and qualititative results.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Existing comparison methods are primitive, and they need to be supplemented. After they were published, several advanced models have been proposed. Please check below references [1, 2, 3]. Novelty of this paper is limited because their model seems like a simple combination of existing methods. For example, idea of self-reference is already introduced in few-shot segmentation of natural image and cross-reference models are also proposed [4, 5]. The authors should compare their model with them and clarify contributions. In addition, contrastive learning is applied in several few-shot segmentation papers [6, 7]. Considering improvements in few-shot segmentation models for natural images, improvement of their proposed method is not that surprising.

    [1] Tang, H., Liu, X., Sun, S., Yan, X., & Xie, X. (2021). Recurrent mask refinement for few-shot medical image segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 3918-3928). [2] Sun, L., Li, C., Ding, X., Huang, Y., Wang, G., & Yu, Y. (2020). Few-shot Medical Image Segmentation using a Global Correlation Network with Discriminative Embedding. arXiv preprint arXiv:2012.05440. [3] Kim, S., An, S., Chikontwe, P., & Park, S. H. (2021, May). Bidirectional RNN-based Few Shot Learning for 3D Medical Image Segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 3, pp. 1808-1816). [4] Zhang, B., Xiao, J., & Qin, T. (2021). Self-guided and cross-guided learning for few-shot segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8312-8321). [5] Liu, W., Zhang, C., Lin, G., & Liu, F. (2020). Crnet: Cross-reference networks for few-shot segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4165-4173). [6] Liu, W., Wu, Z., Ding, H., Liu, F., Lin, J., & Lin, G. (2021). Few-shot segmentation with global and local contrastive learning. arXiv preprint arXiv:2108.05293. [7] Liu, C., Fu, Y., Xu, C., Yang, S., Li, J., Wang, C., & Zhang, L. (2021, May). Learning a few-shot embedding model with contrastive learning. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 10, pp. 8635-8643).

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Authors checked “Yes” for most questions on the reproducibility of the paper

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    I do not understand why performance of PANet and SENet is much lower than the proposed method. As far as I understand, performance of PANet at least be similar with their model without any regularization.

    Evaluation setting is limited. Authors uses only big organs which are relatively easy to segment. In addition, they use 2D slices from support and query volume after being carefully matched. To make a general framework applicable to various organs, entire process need to be done with 3D volume. Then other organs can be also used for evaluation.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Though their proposed regularization improve performance of the model, novelty and the comparison are limited.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    I’ve changed my rating because several issues were clarified with the authors’ responses. It would be good if the responses are added in the camera ready version if accepted.

    Also, it would be good if the following contents were also considered.

    • Regrading , I think that their low performance is originated from different experimental settings, i.e., superpixel-based self-supervision (SSL) vs supervised episodic training.

    In [10], performance of SSL-PANet significantly surpass that of vanilla PANet, which was one of their main contribution. Similarly, if SSL-SENet and other methods are trained in the self-supervised setting, their performance could be comparable to the proposed method. Therefore, it is unfair to compare the proposed method with others because it is unclear where the performance gain comes from.

    • Authors noted that C1 requires registration process, but it’s not true. C1 uses the same data processing following the work of [10] as in their paper. The source code is available at https://github.com/uci-cbcl/RP-Net. As C1 reports higher segmentation performance in the paper, comparison with C1 seems necessary.



Review #2

  • Please describe the contribution of the paper

    The paper proposes a novel method for few-shot semantic segmentation using class prototypes. These prototypes are locally created. The authors incorporate self-reference regularization, contrastive learning, and self supervision in order to train their models. To evaluate, they use two public available abdominal datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength of the paper is the use of contrastive learning in order to generate more discriminative prototypes. Also, they use this technique simultaneously with other techniques such as novel self-reference loss, and recently proposed ones such as local prototyping and self-supervision. The use of contrastive learning paired with prototypes for segmentation seems to be a fully novel idea that yielded consistently better results than baselines.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • One could argue about the novelty of the self-reference used, since it is similar to the one proposed by the PANet model¹. However, the PANet models use the query features to compute prototypes while the authors use the support set features, and this suffices to claim novelty.

    • Additionally, the manuscript would be greatly improved if authors included proper statistical significance testing (hypothesis tests or confidence intervals) in Tables 1 and 2. It would also be very informative to include 5- and 10-shot experiments on the same datasets, as well as experiments on other areas of the body that have abundant public CT/MRI datasets, such as head/neck, thorax and pelvis.

    ¹ Wang, K., Liew, J.H., Zou, Y., Zhou, D., Feng, J.: Panet: Few-shot image semantic segmentation with prototype alignment. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. (2019) 9197–9206

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors describe concisely the implementation details and dataset preprocessing. Code and pretrained models will also be made public. These are enough information to reproduce the experiments with little effort.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • The paper is well written and organized, however, the heavy notation used in Section 2 could be improved to ease the reading. For example, there are too many indexing terms. In Sections 2.2, 2.3, and 2.4 the indexes k and j are not used in any of the equations but are still present in many of the variables, omitting these indexes could reduce clutter. Also, the equations for the background class and target class c are the same. The authors could present only one of the equations and just treat the background as another class.

    • “Specifically, our method achieved the best segmentation performance on each abdominal organ which were significantly better than the second-best method (SSL-ALPNet) (77.65% vs. 73.02% in terms of average DSC)” No statistical significance tests are provided to support this claim. The reviewer strongly suggests the addition of some marker (i.e. \dagger) alongside results that are significantly better than the baselines in Table 1.

    • Authors could also discuss the application of the proposed method in interactive image segmentation as a future work. This could be a great addition to the conclusion of the paper.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents a novel solution to the few-shot semantic segmentation that relies on the use of several techniques to improve on existing methods. Using only two datasets of the same body region hinders the generalization of the proposed method and consequently slightly diminishes the impact of the work.

  • Number of papers in your stack

    1

  • What is the ranking of this paper in your review stack?

    5

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    This work applied self-supervision and contrastive learning to semantic segmentation, where the support image is used as a query image and the model is asked to segment on the support image itself. The proposed methods outperformed the SOTAs on two datasets of 30 and 20 3D images.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed methods are simple and effective on the two tested datasets. The self-supervision method is simple and straightforward as it can be easily applied to any other datasets and network architectures.

    The paper is well written overall.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Potentially limited method novelty as there are multiple works (Liu et al. 2021, Learning a Few-shot Embedding Model with Contrastive Learning; Liu et al. 2021, Few-Shot Segmentation with Global and Local Contrastive Learning) that have applied contrastive learning to image semantic segmentation tasks.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors did not mention whether the code will be released. However, considering the methods are straightforward and the datasets are public, it is possible to reproduce the results.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    In table 2, it would be better to provide another ablation study where the contrastive learning method is used but without self-reference.

    Regarding the contrastive learning, it would be better to cite some other works using contrastive learning to compare with the proposed method. Especially if the proposed contrastive learning method is different from existing works.

    This would be a good paper, if the authors can demonstrate that: This paper proposes the novel self-reference setting. As a self-supervised method, it outperformed the setting that uses contrastive learning without self-reference (the required addition ablation study). Moreover, by combining the self-supervised method and contrastive learning, better performance can be achieved.

    Otherwise, if using contrastive learning alone can achieve a better performance than using self-reference, considering the limited novelty of the contrastive learning approach and limited added value of the self-refrence, the paper would be weak as a conference paper.

    Given the additional ablation study, I would be willing adjust the score to accept.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method includes two contributions. They are both simple and effective. But the contrastive learning part is not novel. The self-reference is novel as there seems to be no paper did the exactly the same modification for few shot semantic segmentation task.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper is clearly written and well motivated but reviewers are questioning the true methodological novelty brought in by the paper. Combined with a sub-optimal evaluation setting (only bigger organs considered, no statistical analysis to assess difference across methods, only a strong demonstration of the novelty would make this paper suitable for the conference. However presentation of this work to a workshop (such as DART) would be meaningful for this paper.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    NR




Author Feedback

We thank meta-reviewer (MR) and all reviewers for their comments.

MR,R1&3 Limited novelty We argue that the proposed self-reference regularization (SRR) and the incorporation of contrastive learning (CL) for few-shot image segmentation (FSIS) are novel, as reflected by R2’s comments. Below we refer the 7 refs given by R1 as C1-C7.

R1 argues that both self-reference and cross-reference are proposed in C4&C5. This, however, is not true. Local prototype-based cross-reference is originally introduced in [10] and is not regarded as our contribution. It has fundamental differences from cross-reference networks in C5. C4 introduced self-guided and cross-guided learning for FSIS but they are designed to handle loss of critical information due to global pooling in generating global prototypes. In contrast, our SRR is designed to learn better local prototypes that are representative for both support and query images. Thus, both purpose and underlying methodology are fundamentally different between C4 and ours.

Both R1 and R3 argue for limited novelty due to the fact that CL was previously used in C6&C7. We acknowledge that there are many studies use CL. However, there are fundamental differences in regards to the details of how to use CL. Image-level CL [C6] or patch-level CL [C6&C7] are used. C7 used PatchMix augmentation to generate patches, which is different from our class-wise CL. C6 only applied CL to patches generated from query images whereas in our class-wise CL was applied to both support and query images to improve feature discrimination by imposing intra-class cohesion and inter-class separation.

MR,R1 Only big organs Actually, most SOTA FSIS methods [C1-C3, 8,10,11] segment big organs. Specifically, [C1,C2,10,11] segment exactly the same types of organs as us. [8] and C3 additionally segment psoas muscle and bladder, but both of them are big organs. For a fair comparison, we follow the experimental protocols of [10].

MR,R2 Statistical significance We conducted paired T-Test to compare the differences between ours and the second-best method (SBM). On Abd-CT dataset, we obtained a p-value of 2.0E-9 while on Abd-MRI dataset, we obtained a p-value of 1.6E-5. Thus, the differences between ours and SBM are statistically significant.

MR,R1 More comparison with SOTA methods C1 requires image registration, which itself is a challenging problem and is not appreciated for FSIS. Therefore, we did not compare with C1. As C2 used the same dataset and the same 5-fold cross-validation study, we compared our results with their reported results. In terms of mean Dice, our method is much better than theirs: 67.94% (ours) vs. 61.73% (theirs) on Abd-CT dataset, and 77.65% (ours) vs. 67.30% (theirs) on Abd-MRI dataset. C3 did not release source code. Thus, we tried to implement their method but our reproduced results are far from their reported results.

R1 Low performance of PANet and SENet The results of PANet and SENet were reported in [10]. Since we used the same dataset and the same experimental protocol as in [10], in Table 1 we directly compared our results with the results of PANet and SENEt reported in [10].

R1 Carefully matched 2D slices Matching 2D slices is to prevent test organs from appearing in the training set, so as to meet the setting of segmenting unseen organs in FSIS, which follows the same protocol as [10] and C1.

R1 volume-based instead of slice-based In FSIS, most of SOTA methods [C1,C2,8,10,11], if not all, are slice-based. C3 used 5 adjacent slices and thus can still be regarded as a slice-based method.

R2 5/10-shot & other areas of the body & interactive segmentation We will sincerely consider it in future work.

R3 Additional ablation study We conducted additional ablation study when only CL is used but without SRR. An average Dice of 74.9% was found. P-values from paired T-Test results are as follows: 1.0E-3 (no regularization vs. only CL), 1.8E-2 (only CL vs. only SSR) and 9.6E-3 (SSR&CL vs. only SSR).




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal clearly answered to the various concerns raised during the review, proving the relevance of the method and demonstrating its novelty making it suitable for publication if these additional elements are included in the camera ready version

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    15



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The main strengths of the paper are: 1) well written and motivated; and 2) the novelty of the proposed solution to the few-shot semantic segmentation. The main weaknesses identified for the paper are the limited novelty (contrastive learning part is not novel, but the self-reference is novel), limited comparison, uncompelling experiments with just two datasets of the same body region, where only bigger organs considered, and no statistical analysis. The rebuttal presents convincing arguments about the method’s novelty, statistical significance, and some experimental details. Since I believe that the main issues were addressed, I recommend acceptance.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    2



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper proposed a few-show learning based medical image segmentation method, where self-reference regularizes a class prototype to be representative of entire organ in support images and contrastive learning helps learning similarity between foreground and background features. The main idea of this paper is clear and easy to understand. The proposed method has some novelty though it has close connections with existing methods. It outperforms SOTA on two datasets. The rebuttal addressed some major concerns of Reviewer 1 who raised the final score from negative to positive. Since all three reviews are positive, I recommend accepting this work.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    3



back to top