Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Xiangyi Yan, Junayed Naushad, Chenyu You, Hao Tang, Shanlin Sun, Kun Han, Haoyu Ma, James S. Duncan, Xiaohui Xie

Abstract

Recent advancements in self-supervised learning have demonstrated that effective visual representations can be learned from unlabeled images. This has led to increased interest in applying self-supervised learning to the medical domain, where unlabeled images are abundant and labeled images are difficult to obtain. However, most self-supervised learning approaches are modeled as image level discriminative or generative proxy tasks, which may not capture the finer level representations necessary for dense prediction tasks like multi-organ segmentation.

In this paper, we propose a novel contrastive learning framework that integrates Localized Region Contrast (LRC) to enhance existing self-supervised pre-training methods for medical image segmentation. Our approach involves identifying local regions with similar semantic meaning and performing local contrastive learning using a novel contrastive sampling loss. Through extensive experiments on three multi-organ segmentation datasets, we demonstrate that integrating LRC to an existing self-supervised method in a limited annotation setting significantly improves segmentation performance. Moreover, we show that LRC can also be applied to fully-supervised pre-training methods to further boost performance.

Our proposed framework provides a promising direction for enhancing self-supervised learning for medical image segmentation. By incorporating LRC, we enable the learning of more localized and fine-grained representations that are crucial for accurate segmentation of complex structures. Our approach can be readily incorporated into existing self-supervised learning frameworks and has the potential to improve segmentation accuracy in real-world clinical applications.



Link to paper

DOI: https://doi.org/10.1007/978-3-031-43895-0_44

SharedIt: https://rdcu.be/dnwyX

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    A technique is proposed for medical image segmentation that involves contrastive learning-based pre-training, followed by supervised finetuning. The main idea is to do two types of contrastive learning: (1) global contrastive learning, where an encoder is trained to represent similar images in close proximity and dissimilar images further apart, and (2) local contrastive learning, which utilizes an encoder-decoder framework to map representations of pixels within the same superpixel closer together and those of pixels in different superpixels further apart. A separate algorithm is used to produce the superpixels.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • For dense prediction tasks such as image segmentation, pre-training both encoders and decoders is a better strategy than pre-training only encoders. The proposed strategy for such pre-training is promising.
    • The method is validated on multiple datasets.
    • Good writing and paper organization.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • No comparisons with existing works of local contrastive representation learning.
    • Several experimental details are missing.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors have agreed to make their code publicly available upon acceptance. Publicly available datasets are used for validating the proposed method.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. The authors seem to be unaware of several previous attempts of doing local contrastive learning for medical image segmentation. There is neither any mention of these very closely related works, nor comparison of performance with these methods. Local contrastive loss for image segmentation is also proposed in (a) Ouyang et al. “Self-supervision with superpixels: Training few-shot medical image segmentation without annotation.” ECCV 2020, (b) Hu et al. “Semi-supervised contrastive learning for label-efficient medical image segmentation” MICCAI 2021, (c) Wu et al. “Dual Contrastive Learning with Anatomical Auxiliary Supervision for Few-Shot Medical Image Segmentation.” ECCV 2022, and (c) Chaitanya et al. “Local contrastive loss with pseudo-label based self-training for semi-supervised medical image segmentation” Medical Image Analysis 2023. (a) and (c) even use super-pixels to construct contrastive learning losses. Please cite these papers, and clearly point out similarities and differences of the proposed method with the methods of these papers. Also, please conduct a thorough literature review, and cite any other papers that may be using local contrastive learning for image segmentation, or other dense prediction tasks.

    2. To add to the previous point, it is said in the paper that “However, most self-supervised pre-training strategies are image [2,3,5,13,17] or patch [1,4] level, which are not capable of capturing the detailed feature representations required for accurate medical segmentation.” In my view, the distinction between patch-level and pixel-level is quite subjective, and depends on implementation details such as the patch size. Thus, comparing with prior work that considers local (patch level or pixel level) representations is necessary.

    3. I wonder if there is a problem with constructing the local contrastive loss by relying on super-pixels, which are local regions consisting of pixels with similar intensities. A potential problem arises when two regions in an image have the same segmentation label but are separated in space, causing them to be assigned to separate super-pixels. This results in the proposed local contrastive loss method considering pixels in one of these regions as “negatives” relative to pixels in the other region, despite the segmentation task requiring them to be assigned the same label. I suggest discussing this issue in the paper, citing Chuang et al.’s work on “Debiased contrastive learning.” (Advances in Neural Information Processing Systems, 2020), and providing examples of scenarios where this problem exists and where it does not.

    4. It is unclear to me exactly which experiments were conducted. What is meant by “we select 9 self-supervised pre-trained with 1 ImageNet supervised pre-trained networks and combine our proposed localized region contrast”? Were the 9 SSL methods used for pre-training global encoders with ImageNet data? Were these methods used for pre-training with the Abdomen data? Are the 9 SSL methods doing global or local contrastive learning? If they are doing global contrastive learning only, how was this combined with the global contrastive learning described in Sec 2.1 under “Global contrast”?

    5. The ablation study “effect of additional parameters” is not clearly explained. Specifically, what is meant by “adding the same number of MoCo pre-trained network parameters to the above global pre-trained methods.” Also, this is a slightly strange point that more parameters can somehow be better for pre-training. Can you refer to other works that show results to support this conjecture?

    6. It is said that “when the number of samples N is large, the sampling bias can be high, since the number of pixels can be smaller than N.”. This assertion appears to be incorrect. Since the feature representations used in the local contrastive loss are averaged over N pixels, it is unclear why increasing the number of sampled pixels would result in decreased performance. Instead, it is possible that the increase in N could exacerbate the false negatives problem described in point 1.

    7. Why are different optimizers (SGD and Adam) used for the global and local pre-trainings? If this was important for obtaining good performance, please mention this clearly.

    8. Please consider replacing the words “Felzenszwalb’s algorithm” with “graph-based superpixel detection” or something like that, and cite [12] whenever this is mentioned.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Discussion of very relevant prior work is missing. As well, there are no comparisons with this prior work. This makes it very hard to judge the merit and novelty of the most important technical contribution of this paper (the proposed local contrastive loss). Other than this limitation however, I like the paper: the idea of doing local representation learning makes sense, and the experimental validation on 3 datasets is impressive.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    My main concerns have been pacified by the rebuttal. Also, I did not see any new major weaknesses in other reviews, that I had missed. I have therefore improved by rating of the paper. I encourage authors to include discussions regarding related works, and the bias of using superpixels in the paper.



Review #2

  • Please describe the contribution of the paper

    This paper proposes a novel contrastive learning framework that integrates Localized Region Contrast (LRC) to enhance existing self-supervised pre-training methods for medical image segmentation. The approach involves identifying Super-pixels by Felzenszwalb’s algorithm and performing local contrastive learning using a novel contrastive sampling loss. Through extensive experiments on three multi-organ segmentation datasets, the authors demonstrate that integrating LRC to an existing self-supervised method in a limited annotation setting significantly improves segmentation performance. Moreover, they show that LRC can also be applied to fully-supervised pre-training methods to further boost performance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well organized and written. The idea that combines with Localized Region Contrast (LRC) is to improve existing self-supervised pre-training methods for medical image segmentation. The paper shows extensive experimental results and analysis to support the proposed approach’s effectiveness and generalizability.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The proposed method seems interesting. The authors should point out the advantages of the proposed method when compared to existing supervised and unsupervised pre-training approaches. What is the training time of the proposed method?

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The reproducibility of the paper is acceptable.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The proposed method is interesting and the paper is well organized.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well organized and written. The idea that combines with Localized Region Contrast (LRC) is to improve existing self-supervised pre-training methods for medical image segmentation.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    7

  • [Post rebuttal] Please justify your decision

    Thanks for the feedback. The authors provide more classification and comparison results and further verify the effectiveness of the proposed method compared to state-of-the-art methods. The main concerns and most of minor points have been all addressed. Therefore, I would like to increase my score to strong accept.



Review #3

  • Please describe the contribution of the paper

    This paper designs a local region contrast module (LRC) to enhance the segmentation performance of existing self-surpervised pre-training framework, eg. Relative Loc, MoCo. Extensive experiments conducted prove the generality and effectiveness of this method in multi-organ segmentation tasks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Self-supervision is a valuable research topic.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The same idea has been studied in the AAAI-2022: Separated Contrastive Learning for Organ-at-Risk and Gross-Tumor-Volume Segmentation with Limited Annotation. Why not survey this study and make a comparison?
    2. Simply concatenating the features trained by existing self-supervised training methods with those trained by LRC, which requires additional computation and memory resource, is lack of noverty. In addition, the raw low-level LRC features are directly intergrated into the original features to obtained final segmentation masks through only one convolution layer, which may not be very convincing.
    3. In the fine-tuning stage, it is unclear how to train the framework with and without LRC.
    4. This method is designed for 2D slices so I doubt whether it can be extended to 3D area.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    No codes are provided but it’s easy to reproduce.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    See 6.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The organization and writing is good. From the aspect of the technique novelty, this paper is slightly limited. Since the most relative study, SepaReg, is not included, I strongly suggest authors make a comparison and discussion with SepaReg and highlight your difference.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    4

  • [Post rebuttal] Please justify your decision
    1. Although authors have added experiments to discuss the novelty, the motivation and underlying principle are still unclear. For example, in radio images, the organs are not determined by the related background but the HU values and the shapes. Hence, enclosing the box of each organ will not produce false positives or negatives.
    2. What’s more, the fine-tune stage is still confusing that they seem to concat the global and local features for decoding the mask. Since the deep learning models with more feature channels usually have better performance and the traditional fine-tune method has only one part of features, I doubt whether the performance boost comes from the increased parameter numbers.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The reviewers criticized the paper for its limited novelty, lack of understanding of related works, limitation to 2D, and no code released for reproducibility. It might be challenging to win over reviewers, but I still would like to grant you a rebuttal opportunity to carefully address each of the weaknesses indicated by all three reviewers.




Author Feedback

We thank the valuable feedback provided by all reviewers and are grateful for affirming reviews received (R1, R2). Now we address each of the raised comments. 

Q1. Limited Novelty (R3, Meta-R) We respectfully disagree that our work lacks novelty: 1) R3’s novelty concern stems from our decision to simply concat global and local features. This choice is made deliberately to show the efficacy of LRC. Fig 3 shows even naive K-means can accurately represent LRC’s embedding. This evidence highlights LRC’s ability to incorporate abundant low-level information. We then intentionally adopted a naive Conv layer to stress it. 2) Regarding R3’s critique comparing to SepaReg, LRC is significantly distinct from SepaReg. Please see Q2 for further clarification.

Q2. Related work comparison (R1, R3, Meta-R) To reply comments on prior local contrast works, we tackle them via two means: First, we show experiments to prove LRC’s superiority. Using Thorax-85 as a benchmark dataset with 10 labeled scans, the chart below demonstrates: 1) our method significantly outperforms others (last row, Dice 88.6); and 2) by incorporating LRC, we can further boost prior works’ performance.

vanilla w LRC
R1—(a) 83.9 N/A
R1—(b) 84.0 N/A
R1—(d) 85.4 86.0
R1—(d) 86.1 86.8
R3-SepaReg 87.0 87.9
Ours 86.6 88.6

Next, we highlight how LRC distinctly varies from each prior work:

R1: (a)Both LRC and Ouyang et al. use superpixel for local cluster proposals, yet they use a prototypical approach, mapping support labels onto query predictions. (b)Hu et al. propose using local contrast, yet they count same position points as positive pairs and all others as negatives, causing false negatives. (c)Kwon et al. suggest a dual contrastive objective in a prototypical approach, akin to (a). Yet, lack of code and no medical dataset trials bar direct contrast. (d)Chaitanya et al. start with some labeled data for initial training and then use predicted pseudo-labels for further training. LRC can lift its output from 85.4 to 86.0 (inter) and 86.1 to 86.8 (intra).

R3: Like LRC, SepaReg uses superpixel for local proposals. It creates local region patches, while LRC uses local sampling loss for positive / negative pair creation. Their local patches, enclosed in box shapes, contain non-related background. This can cause both false negatives and positives, which can impact the final contrastive learning output. LRC can enhance its DSC from 86.6 to 88.6.

We thank R1 and R3 for their input. All tests will be added in the final version, and each paper will be cited aptly.

Q3. Extendability to 3D (R3, Meta-R) LRC can readily extend to 3D: forming 3D superpixel, sampling local clusters and forming pairs to pretrain LRC, are all doable. Our 2D focus comes from GPU limits, as 3D superpixel clusters require excessive memory. Yet, we expect hardware improvements to mitigate this limitation.

Q4. Code for reproducibility (Meta-R) We’ve agreed to share code upon acceptance.

Q5. Bias from superpixel (R1) Yes, we saw that superpixel can lead to wrong tags during our study. Balancing pixel-intensity likeness and spatial unity is key. Higher likeness causes false negatives, while higher unity causes false positives. Thanks for suggesting DCL, we will cite and include it in our future directions to cut the bias from superpixel.

Q6. Experimental clarification (R1, R2, R3) 1)All 9 SSL models are seen as global contrast. Weights are pretrained on Abdomen-1K for 200 epochs, while LRC is pretrained for 30. Both are then fine-tuned on a dataset with few labels. This takes 6 hours on a RTX A6000 GPU. 2)’ImageNet’ denotes features are learned from ImageNet supervisedly, showing how natural image features could apply to medical and stressing LRC’s effectiveness in various scenarios. 3)Ablation Study 1 addresses worries that LRC’s gain may just be due to extra parameters. 4)Our optimizer choices were unbiased and made without specific rationales.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Based on rebuttal, R1 changed his review from “weak accept” to accept; R2 changed from “accept” to “strong accept”; and R3 remained “weak reject”. Please address all remaining criticisms in your camera-ready version.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The concerns raised by the reviewers have been mostly addressed in the rebuttal, resulting in two reviewers revising their scores positively. Despite R3’s insistence on maintaining the “Weak Reject”, the authors have effectively highlighted the distinctive aspects of their approach in terms of novelty and resolved some doubts. In my opinion, incorporating the points presented in the rebuttal into the manuscript would make it suitable for acceptance.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Looking at the reviews and the questions raised by the reviewers and the response of the authors. It seems most of the major concerns are explained.



back to top