Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Jianan Li, Yueming Jin, Yueyao Chen, Hon-Chi Yip, Markus Scheppach, Philip Wai-Yan Chiu, Yeung Yam, Helen Mei-Ling Meng, Qi Dou

Abstract

High-level cognitive assistance, such as predicting dissection trajectories in Endoscopic Submucosal Dissection (ESD), can potentially support and facilitate surgical skills training. However, it has rarely been explored in existing studies. Imitation learning has shown its efficacy in learning skills from expert demonstrations, but it faces challenges in predicting uncertain future movements and generalizing to various surgical scenes. In this paper, we introduce imitation learning to the formulated task of learning how to suggest dissection trajectories from expert video demonstrations. We propose a novel method with implicit diffusion policy imitation learning (iDiff-IL) to address this problem. Specifically, our approach models the expert behaviors using a joint state-action distribution in an implicit way. It can capture the inherent stochasticity of future dissection trajectories, therefore allows robust visual representations for various endoscopic views. By leveraging the diffusion model in policy learning, our implicit policy can be trained and sampled efficiently for accurate predictions and good generalizability. To achieve conditional sampling from the implicit policy, we devise a forward-process guided action inference strategy that corrects the state mismatch. We collected a private ESD video dataset with 1032 short clips to validate our method. Experimental results demonstrate that our solution outperforms SOTA imitation learning methods on our formulated task. To the best of our knowledge, this is the first work applying imitation learning for surgical skill learning with respect to dissection trajectory prediction.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43996-4_47

SharedIt: https://rdcu.be/dnwPs

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #3

  • Please describe the contribution of the paper

    ESD is considered are as a high-skill endoscopic procedure, in which the learning curve is a main blocker, patients can benefit from an early resection of a tumor if junior practitioners can perform an endoscopic submucosal resection. this kind of strategy by imitation learning can facilitate the guidance for optimal dissection trajectory in novices.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    clear and concise paper, need statment is ok, solution is whoth to explore further

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    the video data set came from a experienced surgeon, to avoid bias further investigations should be made in video data sets from junior and intermediate practitioners.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    reproducible

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    congrats from your worrk, can be potentially be trabslated to a clinical scenario which can benefit patients,

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    8

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    excellent example of clinical translation

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper proposes to resolve the surgical instrument trajectory prediction in the Endoscopic Submucosal Dissection (ESD). It uses the implicit behavior cloning and the diffusion model to learn the surgical skills from the expert demonstration. This work uses evaluates on their proposed dataset and achieves a superior result compared to the previous works.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper handles the novel trajectory prediction task in the real-world endoscopic surgery scenario, by predicting 2D trajectory points to provide high-level assistance.
    2. The task is well defined as the joint distribution prediction. Compared to train and evaluate with the simulation scenario (JIGSAW) and robotic surgery with robotics kenimatic information, this work propose a 2D solution and is applicable to the other kind laparoscopic surgeries.
    3. The proposed method outperforms the prior works by a large margin, and the evaluation is done with in-context and out-of-context scenarios.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The dataset statistic is not mentioned in the work. As the training set and testing set are collected from the scratch. I’d like to see some statistics such as, x, y distribution and so on, to ensure the diversity of the dataset and prevent model to learn trivial trajectories. Also, will you consider to public the dataset in the future?
    2. Can you explain more about the difference between MID [7] and this work?
    3. Paper also mentioned that iBC performs even worse than the BC, do you think for the current dataset the implicit policy learning is not necessary? Apart from the EBMs, what else reason do you think make iBC worse?
    4. Can you explain more about why diffusion process is better in the surgical scenario?
    5. What makes you select 3s and 128x128 in the setup.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    According to the paper, if the dataset and code will be public, this paper is reproducible. If not, more details about the dataset is needed in the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The paper is well written but a few things need to considered. Trajectory prediction as a hot topic in the computer vision field has been widely explored. However, it is more challenging to transfer that to the endoscopic because of the loss of depth perception, variation of surgical scenes and so on. Therefore, I think that the surgical-specific investigation to adapt conventional trajectory prediction method into surgical field is necessary. This paper has a good task definition but lack the motivation behind the selected method.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    As this is a novel task and this paper provides a strong baseline, I make this recommendation. It depends if this paper will public the dataset for the future study and I might change my recommendation.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #1

  • Please describe the contribution of the paper

    The paper discusses the challenge of predicting dissection trajectories during Endoscopic Submucosal Dissection (ESD) surgery, which is crucial for ensuring surgical safety. The authors propose a novel approach called Implicit Diffusion Policy Imitation Learning (iDiff-IL) that leverages implicit modeling to express expert dissection skills and handles the large variation of surgical scenes. To address the limitations of inefficient training and unstable performance associated with EBM-based implicit policies, they formulate the implicit policy using an unconditional diffusion model. They also devise a conditional action inference strategy with the guidance of forward-diffusion to enhance prediction accuracy. The method is evaluated on a surgical video dataset of ESD procedures and achieves superior performances compared to the state-of-the-art trajectory prediction methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • This paper tackles the important clinical task of predicting Endoscopic Submucosal Dissection trajectories, which can significantly improve surgical safety and training.
    • The use of an unconditional diffusion model as an implicit policy in the training process is a novel and promising approach, leading to superior performance compared to state-of-the-art methods.
    • The authors’ intuitive choice to guide the inference with images enhances the accuracy of trajectory predictions.
    • The figures and presentation of the paper are clear and effective in helping readers understand the methodology and results.
    • Evaluation includes experiments using state of the art CV methods
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The temporal context used as input states is limited to only 1.5 seconds, which may not capture all relevant information for trajectory prediction, and the model’s ability to generalize to longer sequences is unclear.
    • The dataset used in the experiments contains 1032 video clips, but it is not publicly available, which limits the reproducibility and comparison with other methods.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • dataset is private
    • no code availability mentioned in paper
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • As the current model uses a limited temporal context of only 1.5 seconds as input states, it would be interesting to perform ablation experiments with different amounts of temporality, such as using only the current frame as the state or increasing the input states to 3 or 6 seconds.
    • It would be beneficial for the research community if the authors make their dataset publicly available to facilitate future research in this area.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Based on the strengths of the paper, which include tackling an interesting and relevant clinical task of Endoscopic Submucosal Dissection trajectory prediction, proposing a novel approach using a diffusion model as an implicit policy in the learning process, and providing clear and helpful figures, I recommend accepting the paper for publication.

    While there are some weaknesses such as the limited temporal context used as input states and the lack of publicly available dataset, the authors address these issues by suggesting future work, which includes exploring ablation with different amounts of temporality and considering the release of the dataset for public use.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper proposes a novel imitation-learning based approach for predicting dissection trajectory from expert video demonstrations. The method has been validated on a newly collected Endoscopic Submucosal Dissection video dataset, demonstrating superior performance to both explicit and implicit SOTA approaches for trajectory prediction. The topic is relevant and of clinical interest, the approach is novel, the experiments are thorough, and the paper is well-presented.

    Feedback from reviewers including additional details regarding the dataset, clarification regarding the model’s ability to generalize to longer sequences, and additional discussion on the experimental results and SOTA comparison should be incorporated in the final submission.




Author Feedback

N/A



back to top