Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yotam Intrator, Natalie Aizenberg, Amir Livne, Ehud Rivlin, Roman Goldenberg

Abstract

Computer-aided polyp detection (CADe) is becoming a standard, integral part of any modern colonoscopy system. A typical colonoscopy CADe detects a polyp in a single frame and does not track it through the video sequence. Yet, many downstream tasks including polyp characterization (CADx), quality metrics, automatic reporting, require aggregating polyp data from multiple frames. In this work we propose a robust long term polyp tracking method based on re-identification by visual appearance. Our solution uses an attention-based self-supervised ML model, specifically designed to leverage the temporal nature of video input. We quantitatively evaluate method’s performance and demonstrate its value for the CADx task.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43904-9_57

SharedIt: https://rdcu.be/dnwH3

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This submission proposes a re-identification method based on detected polyp’ image appearances for long term polyp tracking in a colonoscopy video. The authors use an existing polyp detection model and crop regions of detected polyp in a video. For the cropped polyp images, they extracted feature vectors and used them for polyp re-identification in a video.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    -Focusing polyp tracking in a video for the analysis of real colonoscopy images. -Tackle the same polyp classification among similar image appearances.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    -Unclear technical novelty. -Unclear description of methodology with incorrect mathematical notations. -Insufficient survey. -Insufficient advantages/clinical impact of this work.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    This work has insufficient repeatability. They do not use any public datasets, and do not intend to share their codes. Furthermore, the description of the methodology part lacks detailed settings. Moreover, the presented mathematical notations are out of the standard. One can implement the proposed method for its re-experiment.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    In the past decades, previous works of the re-identification have been reported. However, this submission missing these works. Overall, descriptions in the manuscript are wordy and missing detailed explanations. This leads to manuscript’s low readability. Experimental settings are also hard to understand. These result in unclear technical contribution and clinical impact. Therefore, the real outcome of this submission is unclear.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    2

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Due to unclear technical novelty and clinical impact, unfortunately, I conclude that this submission is inadequate for the MICCAI presentation.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The study presents a self-supervised learning approach that encodes polyp tracklet views into a deep tracklet representation. The reported experiments aim to illustrate the advantages of the proposed tracklet representation for polyp re-identification, highlighting enhancements in offline tracking and polyp characterization in colonoscopy when compared to the use of standard tracking-by-detection methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The idea of the authors of learning a joint multi-view polyp representation by incorporating attention-based weighting of tracklet frames (this time through a transform network and CLS) is interesting and it is in principle a promising approach for acquiring a comprehensive representation of polyp visual properties that can be used for several downstream tasks.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The authors assess the impact of the proposed approach on enhancing the performance of polyp characterization (CADx). However, optical biopsy of polyps during colonoscopy is clinically relevant when performed in real-time. In contrast to methods like [23], the proposed approach does not appear to have the capability to operate in real-time, or at the very least, this study does not explicitly evaluate or benchmark its performance in such a setting.

    The results section appears weak, despite using quite a big custom dataset. In particular:

    1. The advantage of employing contrastive learning for learning a single frame encoder remains unclear. I was expecting to see, at the very least, a comparison between the proposed method and an approach that utilizes a pre-trained ResNet50V2 on ImageNet as a single frame encoder, combined with tracking and reID using the averaging method, for both ReID and CADx accuracy.

    2. The reported results do not clearly demonstrate the effectiveness of the proposed methodology in efficiently capturing polyp tracklet visual information, which is the primary objective of the approach for CADx enhancement. The authors solely compare the results obtained using detection-by-tracking versus detection-by-tracking+reID, without comparing them to other methods such as [23] or the aforementioned approach utilizing ResNet50V2+averaging. It is not surprising that the ByteTrack+reID method outperforms plain ByteTrack in Table 4 since the former is the only one considering polyp appearance, although the extent of its performance in this aspect remains unknown.

    3. Lastly, while there is a noticeable improvement in the Fragmentation Rate statistics for tracking and ReID when compared to detection-by-tracking alone (as indicated in Table 2), the enhancement provided by the joint embedding model over the frame embedding averaging method seems relatively modest (as observed in Table 1 and Supplementary Fig. 3). In my opinion, it would have been informative to also include the Tracking and ReID results obtained using the averaging method in Table 2.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper provides basic information for the method re-implementation; however, due to the absence of code and the use of a custom dataset, its reproducibility is significantly limited.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    I believe that the proposed approach is more suitable for automated reporting rather than CADx enhancement, even though the results for this task (0.45 fragmented polyp ratio) are not yet satisfactory. Therefore, I recommend removing claims regarding CADx improvement and instead, emphasizing the potential for automated reporting.

    As mentioned in the weaknesses section, there is room for expansion in the results section to provide a more comprehensive benchmarking analysis, highlighting the improvement achieved by the two novel components in comparison to competitive approaches. The advantages of employing contrastive learning for single frame encoding over competitive approaches remain unclear, as well as the specific enhancement in CADx performance resulting from the utilization of tracklet representation in ByteTrack + reID, as opposed to just ByteTrack + frame encoding + averaging.

    I suggest renaming the ‘ReID’ row in Tables 3 and 4 to ‘Tracking+ReID,’ following the same convention used in Table 2.

    Could you please clarify if the 3290 colonoscopy videos utilized in the ReID for CADx experiments are derived from a subset of the 22,283 videos used for standalone ReID evaluation? To enhance the paper, it would be beneficial if the authors could provide a more detailed explanation regarding the acquisition process of these videos.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Despite proposing an interesting novel approach for polyp re-ID, I believe that this approach has limited clinical value at this stage since CADx enhancement should be done in real-time as done in [23]. Additionally, I feel that the results section is limited in scope and would have benefited from expansion.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    I am still skeptical about the feasibility of applying this approach in real-time such as in [2] and the paper does not report any experiment showing this. However, I am satisfied with the rest of the rebuttal thus I am happy to raise my vote.



Review #4

  • Please describe the contribution of the paper

    The authors present a self-supervised learning algorithm for tracking polyps in colonoscopy. The tracking method uses an attention based transformer architecture to leverage the temporal nature of video input and improves re-identification of tracklets which ultimately reduces the number of fragmented tracks. Authors also demonstrate a CADx application of optical biopsy of polyps and shows improved performance over a baseline CADx system built on prior art.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The idea of applying contrastive learning to reduce manual labor required for annotating polyp tracks makes a lot of sense and
    • The presented Re-ID algorithm works with existing off-the-shelf trackers which makes it practically more useful.
    • The method is evaluated using a relatively large scale dataset collected from multiple sites.
    • Demonstration of the positive impact of an improved tracking result on CADx application – adds to the potential real world impact of the presented method.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • No major weakness: there are some minor ablations missing which will make the paper more complete. I will describe them in the comments section below.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Nothing major to comment.

    • Authors provide enough details about the training procedure which follows standard practice and provides references on certain decisions.
    • I assume the dataset will not be publicly released. However, the annotated test set would be a great contribution to the community.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    [Suggestions for more ablation results]

    • Single-Frame Representation: While the authors provide a comparison between different aggregation methods (min/max/mean), the efficacy of applying SimCLR to the self-supervised learning task itself is not tested. To clearly show that the proposed contrastive pairs actually provide meaningful contribution to the down-stream task Re-ID task, comparisons to simple baselines such as pretrained ImageNet features or features learned using limited number of supervised samples would be helpful to have.
    • Multi-View Tracklet Representation: The ‘pseudo positive’ pairs make a lot of sense to me but it would strengthen the paper to add some empirical results that compares the ‘pseudo pairs’ to some other baselines. Is discarding the middle segment really more useful? Just as an example, how about some variation of a triplet formulation using the three segments? I don’t have a problem with the author’s algorithm but an ablation to support its usefulness will improve the results section.

    [Some minor questions on details]

    • In Section 3.1, authors filter out tracklets that have less than 15 high confidence detections. Why 15 and how do you define high confidence detections?
    • In the introduction, authors describe their method as “grouping over an extended period of time” which improves upon existing methods that only provide tracks for limited spatio-temporal scope. How would you define an extended period of time in this case? 10 seconds? 100 seconds?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is executed well with a clear narrative and enough experimental results to support that the use of contrastive pretraining combined with an attention based Re-ID improves tracking performance of polyps in colonoscopy. The added CADx results strengthen the paper by giving it a practical and clinically meaningful use case which fits the nature of the conference.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper presents a self-supervised learning algorithm for tracking polyps in colonoscopy, using an attention-based transformer architecture. The method is evaluated using a relatively large-scale dataset collected from multiple sites. The paper demonstrates the positive impact of an improved tracking result on CADx application, adding to the potential real-world impact of the presented method.

    The reviewers have highlighted the following flaws which must be addressed in the rebuttal:

    • Unclear technical novelty and clinical impact (Reviewer 1)
    • Unclear methodology description with insufficient survey (Reviewer 1)
    • Limited clinical (Reviewer 2), limited results section (Reviewer 2). Justify the two comments.
    • Comment on the missing ablation studies as highlighted by Reviewer 4.
    • The paper lacks some details, such as how high-confidence detections are defined and why the cutoff is set to 15, as well as defining what is considered an “extended period of time.” (Reviewer 4).
    • Consider releasing the annotated test set to the community.




Author Feedback

We express our gratitude to the reviewers for their valuable feedback and address the main concerns:

Unclear technical novelty and clinical impact (R1) The novelty is twofold: (a) A novel adaptation of SimCLR for appearance based object tracking in videos, using sampling tailored for videos, (b) transformer based multi-view object representation, trained on unlabeled data using a novel sampling method. The clinical impact is shown for boosting the CADx performance. We also highlight the importance of the method for automatic reporting and lost polyp ReID.

Insufficient reproducibility (R1+R4) To our knowledge, no public datasets are suitable for polyp ReID. The research was conducted in an industrial setting, limiting our ability to release the code or data, but allows evaluation on an order of magnitude more data than in most academic publications. Comprehensive implementation details were provided. The size and diversity of the dataset makes the method highly reproducible. We believe such industrial research is important for MICCAI as it sets high reliability and robustness standards.

Unclear methodology description with insufficient survey (R1) Due to space constraints the related work coverage is limited. Yet, we did explain the factors that are unique to colonoscopy and reviewed the most recent related work.

ImageNet pretrained backbone (R2+R4) R2 and R4 suggested that an ImageNet pretrained backbone should have been evaluated for single frame similarity. We did in fact evaluate such a backbone as a sanity check, and found that it performed poorly on our task. We didn’t include those results as multiple prior studies showed that ImageNet features are less suitable for medical tasks, and malignant tissues specifically (references will be added to the paper). The advantage of the proposed contrastive learning over naive ImageNet pre-training, is in the utilization of large scale unlabeled data for our specific domain.

Limited clinical impact and limited results section (R2) R2 expressed concerns about real-time ReID integration into CADx. This is a valid concern, but for scenarios where polyp classification is delayed till enough evidence is collected (as in [2]), we can certainly apply the proposed method for joining multiple tracklets. We noted that future work may address auto reporting. Even with the current fragmented polyp ratio, a report can be generated with a minimal manual intervention. R2 mentions that we didn’t compare the fragmentation rate of the multiview system to other ReID methods. We thank the reviewer for raising this point. In table 1 we compared the suggested method to multiple ReID aggregation schemes used in prior art (we will add more refs), and showed it outperformed them. The rest of the evaluation is based on the best performing method from Table 1. As explained above, ImageNet features led to poor performance. We emphasize that the CADx test set is distinct from the ReID train data (while the CADx train set is not).

Missing ablation studies on Multiview model (R4) R4 suggests an ablation to demonstrate the value of discarding the “middle segment”. Indeed, we verified that reducing the middle segment hurts the training. We omitted it in the paper as we assumed this is a naive conclusion, but we will add it.

Renaming in Table 3+4: Agreed.

Missing details (R4) High confidence detection is as defined in [24], by object detection threshold. The cut-off of 15 was selected based on analysis of the train set, to reduce the noise without losing too many tracklets. Grouping time: we tested the method in the range of 2 sec to 10 min (will add details to the paper). We will add details regarding the acquisition process of the data (e.g manufacturer, locations).

Releasing the annotated test set to the community: We are discussing releasing the test data with the rights owners.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors have satisfactorily responded to all major concerns. An accept is recommended. The authors must aim at incorporating reviewers’ feedback in the camera-ready version.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    First of all, this paper is novel in terms of solving a new yet important problem: polyp matching or re-identification in endoscopy. Largely reviewers agree on the technical novelty and its re-producibility. As reviewers mentioned that there are some areas to be improved, overall this paper manages to be at the level of MICCAI acceptance.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Although the authors provided their rebuttal, improvement in the score is marginal, and the final score is still among the ones on the lower-side in my pool.



back to top