Authors

Jun Wei, Yiwen Hu, Shuguang Cui, S. Kevin Zhou, Zhen Li

Abstract

Limited by expensive pixel-level labels, polyp segmentation models are plagued by data shortage and suffer from impaired generalization. In contrast, polyp bounding box annotations are much cheaper and more accessible. Thus, to reduce labeling cost, we propose to learn a weakly supervised polyp segmentation model (i.e., WeakPolyp) completely based on bounding box annotations. However, coarse bounding boxes contain too much noise. To avoid interference, we introduce the mask-to-box (M2B) transformation. By supervising the outer box mask of the prediction instead of the prediction itself, M2B greatly mitigates the mismatch between the coarse label and the precise prediction. But, M2B only provides sparse supervision, leading to non-unique predictions. Therefore, we further propose a scale consistency (SC) loss for dense supervision. By explicitly aligning predictions across the same image at different scales, the SC loss largely reduces the variation of predictions. Note that our WeakPolyp is a plug-and-play model, which can be easily ported to other appealing backbones. Besides, the proposed modules are only used during training, bringing no computation cost to inference. Extensive experiments demonstrate the effectiveness of our proposed WeakPolyp, which surprisingly achieves a comparable performance with a fully supervised model, requiring no mask annotations at all. Codes are available at https://github.com/weijun88/WeakPolyp.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43898-1_72

SharedIt: https://rdcu.be/dnwB6

Link to the code repository

https://github.com/weijun88/WeakPolyp

Link to the dataset(s)

N/A

Reviews

Review #3

Please describe the contribution of the paper

The authors propose a WeakPolyp model for polyp segmentation based on bounding box annotations instead of expensive pixel-level labels. They introduce a mask-to-box transformation to reduce noise interference from coarse bounding boxes. They also use a scale consistency loss to align predictions across different scales. The WeakPolyp model achieved comparable performance to fully supervised models without requiring pixel-level annotations.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The paper propose a weak polyp model. They also introduced two novel components: mask-to-box (M2B) transformation and scale consistency (SC) loss in Polyp segmentation. Experiments demonstrate that the proposed method advance the SOTA results.
2. The study is conducted on multiple datasets with large number of samples.
3. The study has a good motivation and demonstrate the clinical relevance of the task.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
(1) Boxpolyp method is similar type of study. It would be interesting to observe the improvement in performance over Boxpolyp approach. The authors can include more similar works. (2) The authors should provide information about the easy and hard cases in their Polyp-SEG dataset.

Minors:
1. Refereces are not consistent. This can be corrected easily by the authors.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The author has provided some of the details. However, there are some of the information missing such as what is the number of testing case and hard testing case in the Polyp-SEG dataset. Moreover, how do they define if it is an easy case or hard case?
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
(1) Boxpolyp method is similar type of study. It would be interesting to observe the improvement in performance over Boxpolyp approach. The authors can include more similar works. (2) The authors does not compare their papers with the recent state of the art such as SSFormer-S, SSFormer-L, UNeXt, LDNet, TGANet and PVT-Cascase. These papers are from 2022 MICCAI and WACV 2023. (3) More description about the authors datasets are required for example, no. of samples in the easy cases, and number of samples of the hard cases. The author should also explain, how did they define easy and hard cases. (4) Lack of comparison in terms of model size, operational efficiency, processing speed, and FLOPs. (5) The author does not provide 5-fold cross validation. Mean and S.D. of scores among multiple trials are necessary as the difference between the proposed method and SOTA approach is not very high. (6) Statistical significance test is not provided. (7) The study should also highlight the limitations such as failing cases.

Minors:
1. Refereces are not consistent. This can be corrected easily by the authors.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The study is well written and innovative. The study focuses into clinical translations rather than just improving the performance. However, my major concern in paper lies in comaprions with the recent state-of-the-art.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #4

Please describe the contribution of the paper

The paper proposed a bounding box-based weakly supervised method, named WeakPolyp, for polyp segmentation. It has two losses. The first one transforms the predicted segmentation to bounding box region for supervision, using proposed M2B transformation. The second one is called SC Loss, which penalizes the different outputs at various scales. The mechanism is simple but performs good. It is mainly validated two datasets, including a public and a private one.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper is easy to follow and clearly organized. The M2B seems novel to me which bridges the ground truth with bounding box labels. But the SC loss is not, which is a simplified version of existing contrastive learning models that penalizes the different outputs of various views of the same image.

The performance is amazingly good. Specifically, if it is correct, when trained under the same model, bounding box supervision performs even better than ground truth supervision. As shown in Table 1 testing results (average testing result: box v.s. gt is 78.6% v.s. 77.7% using Res backbone).
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The good performance mentioned in the “strength” section that bounding box supervision sometimes performs better than ground truth supervision seems a little bit unreasonable to me. Whether hyperparameters of the model supervised by ground truth are carefully fine-tuned. Or is that possible to reproduce it on public datasets and off-the-shelf models, such as on ColonDB using SANet?

Whether the bounding box labels of the SUN-SEG dataset are generated from the ground truth. In practical application, the height and width of bounding boxes could be larger than that of the ground truth, which could introduce much noise to the proposed method. How it will affect the performances.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The bounding box generation strategy is not given and some results are achieved on a private dataset. To guarantee reproducibility, code release may be required.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

The performance is good, where the bounding box supervision seems superior to ground truth supervision to some extent. The superiority may require more explanations.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The performance is good but more details and explanations are required to validate the correctness. So, I suggest a weak acceptance.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

The paper focuses on a new weakly supervised polyp segmentation problem, where using corase bounding box annotation for training segmentation models. The authors proposed a novel mask-to-box (M2B) transformation to minimize interference. This technique supervises the outer box mask of a prediction rather than the prediction itself, substantially reducing the discrepancy between the rough label and the accurate prediction. They further propose a scale consistency (SC) loss for more comprehensive supervision. By deliberately aligning predictions across different scales within the same image, the SC loss significantly decreases prediction variations.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper has a well-organized writing and clear motivation for each part of the proposed method.
- The proposed module is a plug-and-play option and is adaptable to any different backbones.
- The paper introduce one of the first WeakPolyp segmentation task, which can have potential significant impact in this direction.
- The proposed method is evaluated on public available benchmarks, and achieves SOTA results with large performance gain compared with other SOTA methods.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The novelty of SC loss can be questionable because making the consistency between two different sized images have been heavily used in previous research such as in the field of segmentatio and super resolution, etc.
- From Table. 3, the performance of the proposed method show signficant better results than the previous supervised baselines that using more fine-grained pixel-level labels. Could the authors provide more details motivation and analysis of why this happens with much more corase labels?
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The comments
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

See above.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper present a great method for weakly supervised polyp segmentation. The novelty is simple and effective. The authors also present comprehensive experimental analysis in the paper. They also show that the approach can be adapted to supervised segmentation approach to further boost the performance. Hence, I lean toward to accept the paper.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This work proposes a weakly supervised polyp segmentation system that learns to transform predicted segmentations into boxes while also maximizing consistency across scales. All reviewers agree that the paper has enough merits to receive early acceptance, so I will support the consensus.

I agree with R4 on that weakly supervised training improving the performance of fully-supervised counterparts can seem a bit unintuitive, and experiments appear to be partly on a private dataset; therefore I would also like to encourage the authors to share their code publicly with the community, if that is an option.

Author Feedback

N/A

back to top

WeakPolyp: You Only Look Bounding Box for Polyp Segmentation