Authors

Jun Wei, Yiwen Hu, Guanbin Li, Shuguang Cui, S. Kevin Zhou, Zhen Li

Abstract

Accurate polyp segmentation is of great importance for colorectal cancer diagnosis and treatment. However, due to the high cost of producing accurate mask annotations, existing polyp segmentation methods suffer from severe data shortage and impaired model generalization. Reversely, coarse polyp bounding box annotations are more accessible. Thus, in this paper, we propose a boosted BoxPolyp model to make full use of both accurate mask and extra coarse box annotations. In practice, box annotations are applied to alleviate the over-fitting issue of previous polyp segmentation models, which generate fine-grained polyp area through the iterative boosted segmentation model. To achieve this goal, a fusion filter sampling (FFS) module is firstly proposed to generate pixel-wise pseudo labels from box annotations with less noise, leading to significant performance improvements. Besides, considering the appearance consistency of the same polyp, an image consistency (IC) loss is designed. Such IC loss explicitly narrows the distance between features extracted by two different networks, which improves the robustness of the model. Note that our BoxPolyp is a plug-and-play model, which can be merged into any appealing backbone. Quantitative and qualitative experimental results on five challenging benchmarks confirm that our proposed model outperforms previous state-of-the-art methods by a large margin.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16437-8_7

SharedIt: https://rdcu.be/cVRsT

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper
The authors present a polyp segmentation framework. Their main goal is to achieve accurate polyp masks leveraging datasets with mainly polyp box annotations and few pixel level annotations. In particular, the paper proposes:
- Fusion filter sampling (FFS) as a preprocessing module to (i) convert box annotations into pixel level annotations, (ii) exclude difficult/wrong training samples, and (iii) ignore uncertain regions of the image during training. A pretrained model on a segmentation dataset is needed to perform this task.
- Mixture of Annotated and Pseudo labels (MAP): it’s a variation of Cutmix where regions of polyps with pixel level segmentations are pasted onto images with pseudo labels (arising from FFS). This is used to (i) suppress the negative effects caused by the errors in pseudo labels and (ii) to upsample the fully annotated polyps.
- Inter-image consistency (IIC) loss
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The amount of data present in recent datasets (such as LDPolypVideo) is needed for robust translatability of polyp detection models, but has the problem that the labels are “softer” and contain noise. The paper focuses achieving state-of-the-art results (usually obtained on smaller, curated datasets) on these larger and noisier datasets.

The authors employ well thought techniques that are easy to implement and can be applied to a variety of situations.

The experiments show the benefits of the proposed improvements in a consistent, detailed and thorough way.

The methods are implemented using open datasets. Particularly, the results are evaluated on 5 available datasets showing increased results when compared to the counterpart networks on all of them.

The proposed architecture is compared to 9 state-of-the-art networks, and implemented on top of 2 of them. The authors provide additional ablation studies on 2 datasets.

Qualitative examples are also provided, showing the benefits of their proposals.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Why are accurate pixel-level segmentations needed clinically? Even if a polyp is detected perfectly, the resection techniques are very coarse. In my opinion, it is important to find polys, but an exact boundary prediction is not necessary clinicaly. Perhaps it would be good to put the clinical context in the paper to balance the technical contribution - e.g. could this be needed in an autonomous robot setting, etc?

The models were only trained and tested on polyp frames, so specificity has not been evaluated. A pretrained model on a segmentation dataset is needed to perform FFS.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Some clarification needed.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
1. Please, clearly say what data was used for training (particularly in section 4.1). I assume the models were trained with LDPolypVideo but this is not clearly stated. Aditionally, a dataset with segmentation masks is needed for the “pipeline with masks annotations”. What data is used for this?
2. Please state if the baselines (from other methods) were retrained by the authors and if the same data was used. If so, please also state if randomness was accounted for (the same seed was used so the same augmentations, and data loading order was kept)
3. Figure 4 is unclear when the authors refer to “Dice values of the above models under different thresholds.” What threshold is modified? Is this showing the dice score when using different thresholds over the predicted maps? If so, it’s interesting that the Dice score tends to increase with higher thresholds.
4. What threshold was selected for the results shown in Table 1?
5. In 3.1. “Meanwhile, a pre-trained SANet [22] model (trained on small segmentation dataset) is applied to get a coarse prediction P for I.”, please explain how the pretrained model was pretrained or obtained. What data was used, was there any data contamination with the LDPolypVideo dataset, etc.
Minor:
1. “In particular, the widely adopted training set [8, 22] contains only 1,451 images” (please mention dataset names for clarity)
2. In the introduction, the authors fail to mention other segmentation methods that use coarse polyp boxes uniquely as ground truth [refs]
3. In the first paragraph of related work, it sounds like UNet architectures are a subtype of FCN networks. This can lead to misunderstandings, as Fully Convolutional Networks are a segmentation architecture separate from UNet.
Typos:
1. “a generalized polyp segmentation model is urgently needed” -> a “generalizable”
2. But box annotations in LDPolypVideo exist two -> But box annotations in LDPolypVideo have two
3. There might be more
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

8
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The study addresses a problem arising from new publicly available datasets, and is therefore new and timely. It focuses on a problem rooted on clinical translatability (rather than improving results on curated data) which is of great importance. The proposed modules are innovative, particularly the FFS methodology to generate pixel-level labels handling uncertain pixels.
Number of papers in your stack

6
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

This paper presented a polyp segmentation method, which leverages the cheap bounding box annotations to alleviate data shortage for a polyp segmentation task. The authors presented fusion filtering sample, mixture of annotations and pseudo, and Inter-image consistency loss to boost a generalized polyp segmentation model through extra bounding box annotations.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper is well-written and organzied. The experimental results and analysis are sufficient.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The paper presented several strategies to improve the polyp segmentation performance through make full use of the pixel-wise annotations and extra bounding box annotations. However, the connection and relationships among the main components in the proposed method are not clear. Additionally, some important details about the experiments are missing.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

I believe that the obtained results can be reproduced.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

The connection and relationships among the main components (i.e., fusion filtering sample, mixture of annotations and pseudo, and Inter-image consistency loss) in the proposed method should be elaborated. More details of the comparison algorithms should be added. For example, What type of the annotations were used in the network training? And what’s the number of the pixel-wise annotations and bounding box annotations used in experiments respectively.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper is well-written and organized, but some method details should be added.
Number of papers in your stack

1
What is the ranking of this paper in your review stack?

3
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

The authors propose to use leverage bounding box annotations for polyp segmentation task model. For Fusion filter sampling aimed at generating pseudo labels the authors used pertained SANet (previously proposed, [22]) to generate the coarse segmentation map prediction and compare with bounding boxes. They then use a similar technique to cut mix where they mix the patches with pseudo label on random images with true label. Inter-image consistency loss between different view predictions is also proposed. Authors did show some improvement over previous methods. An open question would be does these improvements comes from the use of large LDPolypVideo dataset for their coarse prediction or technique implemented that mimics more of a data augmentation techniques e.g. cut mix and loss function comparing different views. Also, is the preciseness of few more percent improvement clinically relevant.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- A simple technique to leverage the dataset with bounding boxes
- The proposition is well ablated
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The paper lacks clarity and motivation of the paper is not clear. Please see constructive comments for details.
- The paper uses previously proposed techniques in their model and propositions
- Math symbols and equations needs to be checked. For e.g., images and labels needs to be a vector/matrix of dimension d and hence represented either in bold or capital
- Question remains, is the method generalisable to other unseen datasets?
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Authors have checked yes for the reproducibility of paper.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

The reviewer would argue on following points and addressing these could improve the paper: 1) Subjectivity in labelling is more of a problem than erroneous labels. Several available datasets that are well-annotated but assessing there subjectivity and finding an agreement is the way forward. Direction of research should include a clear distinction between what does author mean by accurate mask and erroneous mask. 2) Overfitting of previous segmentation models - how do the authors know that these models overfit unless they perform generalisability tests. The same question would be for this work, does the model generalise on unseen datasets. E.g., if you train on one dataset could you try inference on the other dataset? 3) Data shortage in which sense? Argument - there are many publicly available datasets for polyp segmentation are available that can be used to develop methods. How much is sufficient? Authors could cite some papers that reflect to this and make an argument on that? 4) Authors did comment about time consuming in related work but they did not provide the inference time. Also is the network end-to-end trainable
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The method shows some benefit especially in using large detection video dataset. However, the arguments on doing this are slim. Alongside, the generalisability assessment is lacking. I think rationale behind developing a segmentation model with the detection labels doesn’t fit right because you could just make a detection method which is already of interest in clinical practice, why leverage for segmentation?
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

2
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper proposes a novel polyp segmentation method based on the combination of pixel-level labels with a coarse box annotation. Although the idea seems simple, leveraging bounding box annotation together with the existing model boosts the performance and improves the model generalization. The authors demonstrated the performance of the method by comparing several well-known methods. The below is the summary of the reviews:

Strength: Novel idea (well thought techniques) (R1, R3) Well-written and organized (R2) Sufficient experiments and validation (R1, R2, R3)

Weakness: Clinical relevance (R1) Lacks clarity and motivation (R2, R3)

Although the presented work has both strengths and weaknesses, all the reviewers agreed that the proposed idea is novel and useful for real cases where data is not sufficient. Therefore, I recommend acceptance.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

1

Author Feedback

N/A

back to top

BoxPolyp: Boost Generalized Polyp Segmentation using Extra Coarse Bounding Box Annotations