Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yuncheng Jiang, Zixun Zhang, Ruimao Zhang, Guanbin Li, Shuguang Cui, Zhen Li

Abstract

Accurate polyp detection is essential for assisting clinical rectal cancer diagnoses. Colonoscopy videos contain richer information than still images, making them a valuable resource for deep learning methods.However, unlike common fixed-camera video, the camera-moving scene in colonoscopy videos can cause rapid video jitters, leading to unstable training for existing video detection models. In this paper, we propose the YONA (You Only Need one Adjacent Reference-frame) method, an efficient end-to-end training framework for video polyp detection. YONA fully exploits the information of one previous adjacent frame and conducts polyp detection on the current frame without multi-frame collaborations. Specifically, for the foreground, YONA adaptively aligns the current frame’s channel activation patterns with its adjacent reference frame according to their foreground similarity. For the background, YONA conducts background dynamic alignment guided by inter-frame difference to eliminate the invalid features produced by drastic spatial jitters. Moreover, YONA applies cross-frame contrastive learning during training, leveraging the ground truth bounding box to improve the model’s perception of polyp and background. Quantitative and qualitative experiments on three public challenging benchmarks demonstrate that our proposed YONA outperforms previous state-of-the-art competitors by a large margin in both accuracy and speed.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43904-9_5

SharedIt: https://rdcu.be/dnwGI

Link to the code repository

N/A

Link to the dataset(s)

https://github.com/GewelsJI/VPS/tree/main#3-vps-dataset

https://github.com/dashishi/LDPolypVideo-Benchmark

https://giana.grand-challenge.org/PolypDetection/


Reviews

Review #2

  • Please describe the contribution of the paper

    This paper proposes a polyp detection model that considers only a reference and an anchor frame that overcomes SOTA

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Ablation study confirming that all the components proposed are essential for the task
    • Contrast with other methods and overperforming these on two datasets
    • Qualitative results suggest that the proposed approach overcomes few of the issues seen with other methods by only using two frames
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The authors follow the same settings as in CenterNet but clearly that was trained with a different dataset. Ablation studies are required to find lambdas
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Yes the authors will make aspects of their code and details available on acceptance

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • Even when Fig. 1b shows the effect of the number of frames on accuracy, these differences is minimal, i.e. from .68 for 1 frame decreasing perhaps to 0.675 for 21 frames. Therefore the conjecture that the authors use as argument for taking into account onle one reference frame is not that strong
    • The manuscript assumes the reader has good knowledge about CenterNet. Therefore the manuscript might benefit from a brief introduction on how this method works before introducing the authors their proposed model.
    • Please specify what are the inputs and ground truth available for completeness.
    • Make sure all introduced variables and that the multiscale features and up-sampling are shown/illustrated in the pipeline in Fig 2.
    • It is unclear how the alignment of channel activation patterns on intermediate features is achieved, please clarify.
    • What do the authors refer to “occluded or distorted foreground context” in Sec 2.1?
    • Even though FTA is compared against channel-wise and channel-aware attention via the ablation study, channel attention mechanism is part of FTA as per Sec 2.1
    • It is unclear what happens when detected boxes of reference frames are not validated (i.e. below 0.6)
    • According to Fig. 2 F* results from the multiplication of the dynamic field by the enhanced anchor feature F~ but that is not reflective of Eq 5 -Lambdas are not defined in Fig 2

    List of minor problems

    • Considering too many frames rather than collaborating too many frames
    • In Sec 2.1: isn’t reference greater than 1 rather than 3?
    • Fig 2: isn’t the backbone an encoder only extracting features?
    • When defining the features, please define N, T, C, H, and W, but mainly the first two. There is also an inconsistency of these in Sec 2.1 and 2.3
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    clarity of paper, experiments, ablation study, qualitative results

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #1

  • Please describe the contribution of the paper

    This paper proposes a new polyp detection method for a colonoscopy video. The authors introduce the assumption, that successive two frames in a colonoscopy vide have necessary spatial-temporal information, into bounding-box-based polyp detection frame work. They presented experimental evaluations are presented by using large-scale open-access datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    -New assumption for utilising spatial-temporal feature for the polyp detection. -Experimental evaluation with the large-scale open access datasets.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    -Incorrect mathematical notations in definitions (not the academic writing). -Insufficient survey and comparative evaluations. -Insufficient evaluation scores compared with the previous works that reported the evaluation metrics for existing open-access datasets. -Unclear technical contributions.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors performs experimental evaluation by using existing open-access datasets. They have the intend to publish their code after the acceptance. With the open-access datasets and shared code, it can offer the repeatability.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The survey missing previous works of polyp detection with spaital-temporal features, which are published in the proceedings of MICCAI conferences, J. Medical Image Analysis (following is just an example) and other Medical Journals including Ref. [10]. J. G.-B. Puyal, et al., Polyp detection on video colonoscopy using a hybrid 2D/3D CNN, Medical Image Analysis, 2022. Without the comparison with these previous works, the validity of the authors’ assumption about adjacent reference frame is unclear. Note that the evaluation scores reported in this submission is lower than the scores in previous works for the open-access datasets.

    The definitions in Eqs. (1)-(6) based on the python coding styles, and not the correct mathematical notations. Since a paper is not a source code, appropriate mathematical notations should be used in a manuscript. Furthermore, they do not use defined symbols and operations in the illustration of the proposed method (Fig. 2). Therefore, their presentation hinder its readability and repeatability.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Based on the insufficient survey, unclear definitions and experimental evaluations, the genuine technical contributions are unclear for the polyp detection task. The assumption on spatial-temporal feature is the core idea of this submission. However, the validity of the assumption is not fully presented.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    In this paper, authors propose polyp detection model for colonoscopy videos. By using only adjacent reference frame, their model outperforms other SOTA models. Authors also show the effect of the number of frames and ablation study.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Good Comparison: The proposed model is compared to other 11 SOTA models against three public datasets.
    • The paper is well written and easy to follow.
    • Thorough evaluation: Authors show qualitative results and ablation study. In addition, authors show the effect of the number of reference frames and the parameter lambda.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • For example, authors use SUN Colonoscopy Video Database (train set: 19,544 frames, test set: 12,522 frames). How to divide train/test?
    • Limited discussion from clinical view: The discussion of doctors’ use effect is limited.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • Public dataset: Authors train and evaluate the models by using three public datasets; SUN Colonoscopy Video Database, LDPolypVideon, and d CVC-VideoClinicDB.
    • Good explanation of models: The proposed model is explained and authors will provide codes if this paper is accepted.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    For future work, I would recommend:

    • The effect of doctors’ discovery rate: If doctors use the proposed model, the discovery rate of polyp is improved?
    • The performance not in white lightning imaging: there are other type of imaging in colonoscopy such as Narrow Band Imaging(NBI) and Chromoendoscopic imaging. The proposed model outperforms in that imaging?
    • Other type of lesion: For instance, doctors also find cancer and adenoma. The proposed model is effective for that other type lesions?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • Good Comparison: The proposed model is compared to other 11 SOTA models against three public datasets.
    • The paper is well written and easy to follow.
    • Thorough evaluation: Authors show qualitative results and ablation study. In addition, authors show the effect of the number of reference frames and the parameter lambda.
  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper proposes a polyp detection model for colonoscopy videos that outperforms 11 state-of-the-art models using only adjacent reference frames. The authors provide a thorough evaluation with qualitative results and ablation study, as well as showing the effect of the number of reference frames and the parameter lambda. Reviewers find the paper well-written and easy to follow, with good comparisons to other models using public datasets. While there is limited discussion from a clinical viewpoint, reviewers recommend the paper for the clinical translation session as it reflects real-world impact.

    Based on the reviewers’ feedback a early accept is recommended. For the camera ready, include clinical viewpoint discussion, further investigation into the effect of doctors’ discovery rate, and discussion on how this can be effective for different imaging types and lesion types.




Author Feedback

N/A



back to top