Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Xinkai Zhao, Zhenhua Wu, Shuangyi Tan, De-Jun Fan, Zhen Li, Xiang Wan, Guanbin Li

Abstract

Deep learning-based polyp segmentation approaches have achieved great success in image datasets. However, the frame-by-frame annotation of polyp videos requires a large amount of workload, which limits the application of polyp segmentation algorithms in clinical videos. In this paper, we address the semi-supervised video polyp segmentation task, which requires only sparsely annotated frames to train a video polyp segmentation network. We propose a novel spatial-temporal attention network which is composed of the Temporal Local Context Attention (TLCA) module and Proximity Frame Time-Space Attention (PFTSA) module. Specifically, the TLCA module is to refine the prediction of the current frame using the prediction results of the nearby frames in the video clip. PFTSA module utilizes a simple yet powerful hybrid transformer architecture to capture long-range dependencies in time and space efficiently. Combined with consistency constraints, the network fuses representations of proximity frames at different scales to generate pseudo-masks for unlabeled images. We further propose a pseudo-mask-based training method. Additionally, we re-masked a subset of LDPolypVideo and applied it as a semi-supervised polyp segmentation dataset for our experiments. Experimental results show that our proposed semi-supervised approach can outperform existing image-level semi-supervised and fully supervised methods with sparse annotation. Source code will be made available.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16440-8_44

SharedIt: https://rdcu.be/cVRwv

Link to the code repository

https://github.com/ShinkaiZ/SSTAN

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose possibly the first method for semi-supervised video-based polyp segmentation. To accomplish this, they annotate 60 videos of a video polyp detection dataset. Their technique uses two transformers to exploit both spatial and temporal information effectively. The proposed approach beats state of the art techniques.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Motivation is well founded. There is a need for methods which can segment polyps from video, and even creating datasets for such a task has been a shortcoming in the community.

    2.The use of transformers to exploit both spatial and temporal information (although not novel to computer vision in general) is novel for this application.

    1. The authors achieve state of the art results, especially a fairly significant bump in terms of mIoU score. Further the authors get a good bump over the other transformer based PVT technique which shows the importance of the extra data for the data hungry transformers.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. There is nothing particularly novel about this work in terms of technical innovations. Vision transformers have been used both spatially and temporally in the literature for natural images. However, there is good enough application novelty in the reviewer’s opinion to justify publication in this venue. There is a significant need for a semi-supervised video-based polyp segmentation method, and further for the data annotations.

    2. There is no discussion about how the training, validation, and testing images were split, no cross validation, no error bars, and no discussion of potential bias introduced. This can be mildly excused due to the cost of video data annotation and only having a limited number of videos fully annotated, but a discussion of how videos were chosen for the splits should at the very least be included.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    1. It’s unclear from the reproducibility questions whether the authors plan to release their masks they produced.

    2. The authors do not include any complexity information about their approach, where it’s known that transformers tend to be parameter heavy and slow.

    3. I do not see any discussion of hyperparameter sensitivity in the supplemental materials like the authors claim.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. At one point the authors say they annotate every 11 frames. But in the supplemental materials the figure says every 5 frames. This should be clarified.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall this is a decent paper worthy of acceptance for it’s application novelty. There are no major issues with the paper, but also nothing particularly novel in terms of technical innovation.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    A novel approach is proposed for polyp segmentation in clinical videos. This method consists of two main modules that shape a semi-supervised polyp segmentation architecture.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is proposing a novel framework for a well-known clinical problem and represents a stronge evaluation.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There is no information regarding the computation time. It is difficult to evalute the paper based on the feasibility of implementing the proposed approach for real-time clinical application.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    A detailed explanation of the proposed approach is presented and public available dataset is used for the evaluation step.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. Figure 4: if possible, please mark the ROI in the original images in order to present a better understanding of the part that should be segmented for the readers.
    2. Section 4, Concluion: A stronge evaluation over the proposed approach is presented; however, the main advantages of the method over the state-of-the-art in not discussed. I recommend to highlight this point in the manuscript.
    3. Section 4, Conclusion: There are no significant evidence (such as computation time) in the manuscript to evalute the feasibility of the method for the real-time clinical application. I recommend you to add this point along with more explanation on the clinical need for such a system.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The manuscript present a new approach for semi-supervised polyp segmentation using video data. It is very well-written and needs some adjustments and improvements to highlight the advantages and the feasibility of clinical applications.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposed a semi-supervised polyp video segmentation by introducing Temporal Local Context Attention (TLCA) module and Proximity Frame Time-Space Attention (PFTSA) module to improve the video polyp segmentation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper introduced the recent two main modules and proposed a Semi-Supervised Spatial Temporal Attention Network (SSTAN) for the polyp video and showed the higher performance in the segmentation problem.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Two main modules improved the higher accuracy of segmentation performance but the modules are not originally proposed in this paper. Segmentation task is important but I feel automatic classification task of benign or malignant of polyp is more important these days. Some product has been sold these days.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper shows source code is available and it is acceptable.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Qualitative results are shown with different models and the segmentation performance of the proposed approach is good. It is better that some failure examples of this approach is shown with the reason.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Segmentation performance is good and basic improvement was done by adding the two recent modules.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper presents a semi-supervised polyp segmentation approach that exploits spatial and temporal information. While the building bricks are two transformers modules from the computer vision community, their application to polyp segmentation has been recognized as novel by the 3 reviewers. The experimental validation seems solids. Possible application is probably broader than the application at hand. The two main concerns raised by the reviewers are i) the clinical motivation and viability (classification and detection are more relevant tasks than segmentation, does it run in real-time?), and ii) a lack of discussion of the computational complexity. Given that the evaluation relies on a public dataset, sharing the segmentation annotations would be a great contribution to the community.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    3




Author Feedback

N/A



back to top