Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yejee Shin, Taejoon Eo, Hyeongseop Rha, Dong Jun Oh, Geonhui Son, Jiwoong An, You Jin Kim, Dosik Hwang, Yun Jeong Lim

Abstract

The interpretation of video capsule endoscopy (VCE) usually takes more than an hour, which can be a tedious process for clinicians. To shorten the reading time of VCE, algorithms that automatically detect lesions in the small bowel are being actively developed, however, it is still necessary for clinicians to manually mark anatomic transition points in VCE. Therefore, anatomical temporal segmentation must first be performed automatically at the full-length VCE level for the fully automated reading. This study aims to develop an automated organ recognition method in VCE based on a temporal segmentation network. For temporal locating and classifying organs includ-ing the stomach, small bowel, and colon in long untrimmed videos, we use MS-TCN++ model containing temporal convolution layers. To improve tem-poral segmentation performance, a hybrid model of two state-of-the-art fea-ture extraction models (i.e., timeSformer and I3D) is used. Extensive exper-iments showed the effectiveness of the proposed method in capturing long-range dependencies and recognizing temporal segments of organs. For train-ing and validation of the proposed model, the dataset of 200 patients (100 normal and 100 abnormal patients) was used. For the test set of 40 patients (20 normal and 20 abnormal patients), the proposed method showed accura-cy of 96.15, F1-score@{50,75,90} of {96.17, 93.61, 86.80}, and segmental edit distance of 95.83 in the three-class classification of organs including the stomach, small bowel, and colon in the full-length VCE video.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16449-1_14

SharedIt: https://rdcu.be/cVRUU

Link to the code repository

https://github.com/MAILAB-Yonsei/CE-Organ-Recognition

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes an automated organ recognition method in VCE based on a temporal segmentation network. MS-TCN++ model is used for temporal locating and classifying organs including the stomach, small bowel, and colon in long untrimmed videos

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength of this paper is to propose a method that can possible to longitudinally segment VCE images. Whole video can be segmented by the proposed method.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The weakness of this paper is combinations of the existing methods. MS-TCN++, timeSformer and I3D. However, novel combination of existing methods can be evaluated.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    ok

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    This paper is written well to read and data design is also well explained. However, methodological novelty is limited. The paper shows how the existing methods are utilized to make high performance. It is better to emphasize methodological novelty as a MICCAI paper.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I made my decision based on the fact this paper is combination of existing methods.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    5

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    I am satisfied with author’s rebuttal. This paper can be accepted.



Review #3

  • Please describe the contribution of the paper

    The authors propose to solve the important issue of automatically identifying anatomic transition points in video capsule endoscopy (VCE). To solve this, the authors propose to combine features from a timeSformer and I3D model combined in a MS-TCN++ model to automatically identify organs.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is well motivated. Anatomical transition annotation for VCE is a much needed technology to help cut down on what type of lesions and typical locations should be looked for.

    2. There are very few other works published for this specific problem, so there might be a publication justification from this point of view.

    3. The ablations are appreciated.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The authors take two off the shelf feature extractors and use them in an off the shelf temporal organ classification model. There is virtually no technical novelty in the proposed approach. Although the application use case is an understudied problem with the previous state of the art being a simple 2D CNN.

    2. There is nothing in the proposed method which enforces temporal consistency. In Figure 3 we can see several areas where the proposed approach bounces around from stomach to small bowel, back to stomach, to colon, back to small bowel, and back to colon. While the final proposed approach doesn’t show these in the illustrated example (only in the ablated parts) there is nothing to enforce temporal consistency to identify these transition points and ensure they’re localized to a single location (one case of this is shown in the supplemental materials of the 40 cases).

    3. There is no comparison with any prior works (e.g. [11]) or even some reasonable baselines. But there are no true technical contributions to this work, so any baseline would just be an off the shelf method, which their work already is.

    4. How the data was split into training, validation, and testing, and any attempts to ensure no bias was introduced in this splitting decision (e.g. purposely putting easy examples in test) was also not discussed.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    This was not filled out at all, just no for everything.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    See weaknesses and justification for improvements.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall I am a bit torn and lean toward weak reject. If there were no issues with this paper at all (i.e. comparison with what limited previous works that exist (e.g. [11]), cleaned up explanation of how the data was split, discussion if the code and data will be released to the public, etc.) I might lean towards a very very weak accept simply because there are so few works published for this problem and the application novelty might be enough to carry this work for publication despite having no real technical novelty, but as the paper sits I cannot justify giving an acceptance rating.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    4

  • [Post rebuttal] Please justify your decision

    I have to stick with weak reject on this one. The authors addressed my concern over how the data was split and releasing their code. But they cannot release the data, so there isn’t a contribution there. The authors attempt to point out more novelty in their rebuttal but at the end of the day, they’re just pulling state-of-the-art off the shelf models and applying it to their problem, and not even in that novel of a way (TimeSformer is introduced for long-term video understanding, here it’s long term segmenting video into sections… so understanding but simpler in many ways). I have to ask myself, as a reviewer, will the community benefit from seeing this work. Is there anything surprising or interesting that can lead to further innovations. I just don’t think it’s there and for the high standard of MICCAI to accept just someone grabbing some off the shelf methods feels a bit unfair. Now, the fact that the problem is very understudied and therefore from a clinical application point of view their might be some value is why I have it as borderline at all, without this, it would be a solid reject.



Review #2

  • Please describe the contribution of the paper

    This paper proposes a temporal segmentation network to to recognise three different digestive organs throughout a capsule endoscopy video. The method is based on the concatenation of two feature extractors (TimeSformer, I3D) that are fed into a temporal model (MS-TCN). Given the low number, and long duration of the 3 classes, a smoothing term is included to remove any noisy predictions.

    The method is trained and tested on a large dataset of 200 videos that includes both normal and abnormal cases. An ablation is performed comparing different feature extractor options.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The clinical motivation is very well set, and the application use case is very clear.

    The utilised dataset is very large, it would certainly have impact if released.

    The inclusion of segmental metrics in the validation is well apreciated. The exclusive focus on frame-based metrics has been a longstanding issue in surgical temporal segmentation literature, and the study of other temporal event metrics is needed in the field.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Video temporal segmentation is a widely studied topic in MICCAI, with a large number of recent methods published (TeCNO, TransSVNet, TMRNet, etc). Without any direct comparisons, it is hard to understand if the proposed method has any advantage over these (see detailed comments).

    When compared to these other methods, the one being proposed is very complex, with two spatiotemporal feature extractors and an MS-TCN with a very large number of stages. I would suspect this method is computationally more expensive by a significant margin, even though such details are not currently provided in the paper (happy to be proven wrong).

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Code not available. The method description is clear, as well as the training details and parameters.

    Data is not available (it would be very interesting if authors are planning to release, though).

    It is plausible that the general architecturte can be reproduced and tested on a different dataset than the one presented in this paper (there are numerous public datasets)

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    On the comparison with other methods:

    There is a significant amount of literature on surgical workflow segmentation. While the application context is different, the problem formulation of supervised temporal segmentation is exactly the same. The most recent have publicly available code and can off-the-shelf be applied to this problem. Two examples:

    • Gao, Xiaojie, et al. “Trans-svnet: accurate phase recognition from surgical videos via hybrid embedding aggregation transformer.” International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2021.
    • Czempiel, Tobias, et al. “Tecno: Surgical phase recognition with multi-stage temporal convolutional networks.” International conference on medical image computing and computer-assisted intervention. Springer, Cham, 2020.

    No comparison with any of these widely available alternatives is a weak point in the paper.

    There are a few other details that the authors could clarify:

    • The data videos are classified into “normal” and “abnormal”. Can the authors provide more details about what type of abnormal videos are included? From figure 3, I understand that “vascular”, and “bleeding” are two sub-classes within abnormal, but this is not sufficient to fully understand what type of data is included.

    • On the same note, it would be interesting to understand if the proposed method has different accuracy depending on whether the case is normal or abnormal. At present only global accuracy metrics are provided. Can the authors expand on this?

    • “We use segmental edit distance (i.e., edit) and the segmental F1 score at intersection over union (IoU) ratio thresholds of 50%, 75%, and 90% (F1@{50, 75, 90}).” - Please provide references here that fully describe how to compute segmental edit and segmental F1-Score. These metrics are not widely utilised in surgical action recognition, and a reference would be useful to the reader.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This is a very borderline rating, very weak reject. I recognise the paper has very interesting elements. However, it is at the moment unclear if the proposed algorithm is superior to the widely available literature on the topic, given that no comparisons are made, and no other justification is provided for why such methods do not need to be considered.

    Open to review my assessment after the rebuttal discussion, with more information available.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The authors answered satisfactorily and convincingly to most of the issues raised during rebuttal, which I believe result in better highlighting the contributions of this paper.

    Therefore, I increase over my original score.



Review #4

  • Please describe the contribution of the paper

    The manuscript illustrates a deep learning method for the temporal segmentation of anatomical districts of the gastrointestinal tract from uncut sequences of frames from an endoscopic capsule to ease following analysis. The proposed method consists of the combination of various architectures present in state of the art: a temporal segmentation network (i.e., MS-TCN++) to which is given as input the combination of features extracted from two backbones, one based on temporal vision transformers (i.e., TimeSformer) and the other based on 3D convolutions (I3D). The positive impact of the new features was demonstrated by an ablation study on a dataset composed of 40 patients for a total of 3866474 frames.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Clearly explained methodology and overall well written
    • Dataset is acquired only with one manufacturer camera
    • Clinical motivation is properly referenced by the literature
    • Extensive dataset
    • Ablation study
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Limited innovation
    • Dataset won’t be public
    • Lack of comparison with respect to Vanilla MS-TCN++
    • Some details in the experimental protocol are not very clear (i.e. difference between normal/abnormal case)
    • Results discussion can be improved
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Dataset won’t be released

    Framework not reported I think it won’t be to reproduce authors’ results if the dataset will be made available

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Overall the work is very interesting and well-argued. Some features in the images can mislead recognition by the network, and how the effect of temporal information can mitigate this effect. An extension of Figure 3 that goes into greater depth with more targeted visual examples (also as supplementary material) could be a significant and constructive contribution to all researchers working with this type of imagery. Why was a cross-validation scheme not considered? There might be a bias in the results introduced by patient selection.

    Also, in section 3, the first sentence reads, ‘We evaluated our proposed method for organ recognition through 20 normal and 20 abnormal cases’ - this classification was not reported anywhere and should, in fairness, also be reported with respect to training and validation sets.

    In the ablation study, there are no baseline results with MS-TCN++ as presented in state of the art. Thus, it is impossible to understand the modules’ contribution by comparing them with state of the art.

    The performance metrics used are the same as those presented in MS-TCN++, but none of these, Edit, have been defined in the text.

    I would add more clinical references in the final discussion part,

    Minor: Page 2, Sec 2.1 “Two gastroenterologists read the whole images from each WCE case…” -> should be VCE instead?

    “fig.”, “table” goes in capital.

    After the Conclusion section, the title “Acknowledgments” is present even if the section is empty.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method is not particularly innovative, but the study is set up correctly and is certainly of interest to the research community. However, the lack of public dataset and some unclear details give me some hesitation in accepting the paper

  • Number of papers in your stack

    3

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The authors answered most of my doubts, and the study is of good quality. I still have some concerns about the innovation, but I think the contribution has merit to be considered for MICCAI.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The authors proposed a digestive organ recognition approach in video capsule endoscopy which is a fusion of existing approaches. The problem itself is interesting and the dataset seems very relevant for the community. However, there is no indication where this dataset will be released with the paper. Moreover, the technical novelty remains limited as although the authors indicated this to be a methodological contribution, the method is just a fusion of existing approaches. Moreover, comparison with the relevant spatio-temporal approaches is missing which makes it difficult to judge the standing of this work wrt baseline. I invite the authors to submit a rebuttal justifying the major concerns raised by the reviewers.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6




Author Feedback

We thank the reviewers for their very considerate and constructive comments. We have summarized six major comments with corresponding responses in this rebuttal.

  1. Methodological novelty (R1,R3) Our methodological novelty is to suggest an effective “spatio-temporal feature extractor”. Existing feature extractors based on CNNs tend to extract local information by convolution framework. In other words, it tends to be biased in the short-term temporal range. Understanding global information is necessary to recognize organ transitions through a very long-term range of videos. Therefore, we utilize TimeSformer which can capture global long-range dependencies by directly comparing feature activations at all space-time locations without an inductive bias. To the best of our knowledge, it is the first time to use 3D based transformer (i.e., TimeSformer) for video temporal segmentation. Each Effect of I3D and Timesformer is as follows: As shown in Fig 3, when we trained the model based solely on Timesformer, errors within a large range are small, but the error at the organ transition is a bit high due to a lack of inductive bias of locality. Conversely, when we trained with I3D which is the baseline feature extractor, the error at the transition is small, but errors within a long-term temporal range are high due to a lack of understanding global information directly. By combining the two feature extractors, the proposed method became effective in reducing the error near the transition while simultaneously reducing the error within the long-term interval. In addition to organ classification, the proposed method for long untrimmed videos can be utilized in various studies such as disease detection and surgical video recognition.
  2. Compared to the other methods (R2,R3,R4) 2.1 Sorry for the confusion about the prior work (vanilla MS-TCN++). There is a baseline result of MS-TCN++ in this paper which is the combination of I3D and MS-TCN++ (The result is shown in the first row of Table 1). We will revise “I3D” to “I3D (baseline)” in the Results section of the paper. 2.2 We think it is not fair to reimplement the methods that are developed for other purposes (e.g., surgical phase recognition). However, as Reviewer#2 suggested, we conduct an additional experiment with another network, TeCNO. The accuracy of the method is 90.37, the precision is 0.914, recall is 0.892, and Edit score is 51.46. The accuracy of our method is 5.78 higher than that of TeCNO, especially 44.37 for the Edit score.
  3. Dataset issue (R2,R3,R4) First of all, there is Capsule Endoscopy dataset in public, but we cannot use the dataset due to the absence of the annotation for organ transitions. To conduct this study, we constructed a new dataset collected and annotated by our institution. Secondly, the dataset is not released in relation to hospitals that performed the data collection with annotation. Code and pretrained model will be released to our github page.
  4. The results whether the dataset comprises normal or abnormal (R2) For only abnormal, accuracy is 92.33, precision is 0.939, and recall is 0.952. F1@{50, 75, 90} are 93.69, 88.29, and 81.08. Edit score is 94.12. For only normal, accuracy is 98.85, precision is 0.992, and recall is 0.993. F1@{50, 75, 90} are 98.39, 98.39, and 91.94. Edit score is 97.50. We will add it to the supplemental materials.
  5. Any attempts to ensure no bias for datasets (R3,R4) Train, valid, and test sets were selected randomly for each institution and then combined to prevent data bias.
  6. More details about what types of abnormal videos are included (R2) By opinions from eminent clinicians, we selected the four most frequently found lesions. The four types of lesions are as follows: inflammatory, vascular, bleeding, and polypoidal. The 100 abnormal sets used in this study consist of 80 cases of inflammatory, 46 cases of vascular, 35 cases of bleeding, and 67 cases of polypoidal. It will be added to the supplemental materials.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors have satisfactorily addressed most major concerns raised by the reviewers. Releasing clinical data is not always possible due to ethical concerns so I didnot rank my decision based on this. The problem itself is of high clinical relevance and authors have highlighted their contributions more clearly in the rebuttal. The authors should include all these corrections in the camera ready.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal addresses most of the technical issues raised by the reviewers and clarified that code can be released. Although the method is a combination of existing methods, I think the novelty of using them together on a large dataset for a new problem has value for the MICCAI community. Reagarding data - I understand that releasing data is a complex process sometimes and may not be in the authors’ control and I believe that this should not block the ability to share a good experimental result and method.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    9



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The main contribution of this work is an automated organ recognition framework for video capsule endoscopy to shorten the reading time of clinicians by classifying images into the following categories: stomach, small bowel and colon.

    Key strengths:

    • Method evaluated on an extensive dataset comprised of 200 videos (100 of them abnormal).
    • Motivation and clinical use case scenario are clear.

    Key weaknesses:

    • Comparison with baseline methods was missing in the initial submission.
    • Technical novelty may be limited, so it relies on the application which is novel and interesting.

    Review comments & Scores: As the technical novelty of this work may be considered as limited, the main concern was the comparison with state-of-the-art methods. After rebuttal, scores increased.

    Rebuttal: Authors have provided a clear rebuttal on the different concerns raised by the reviewers. They also included a comparison with TeCNO as suggested by R2 showing an improvement. Authors also provided accuracy for normal and abnormal which is appreciated, and I would like to encourage the authors to include this information in the final manuscript. I believe this work should not be penalised for not releasing the datasets.

    Evaluation & Justification: Although the technical contribution is mainly on the application (i.e., use of 3D based transformer for video temporal segmentation) I agree with the reviewers that this is an interesting clinical application and a good contribution to the field.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6



back to top