Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Loc Trinh, Tim Chu, Zijun Cui, Anand Malpani, Cherine Yang, Istabraq Dalieh, Alvin Hui, Oscar Gomez, Yan Liu, Andrew Hung

Abstract

Suturing technical skill scores are strong predictors of patient functional recovery following robot-assisted radical prostatectomy, but manual assessment of these skills is a time and resource-intensive process. By automating suturing skill scoring through computer vision (CV) methods, we can significantly reduce the burden on healthcare professionals and enhance the quality and quantity of educational feedback. Although automated skill assessment on simulated virtual reality (VR) environments have been promising, applying CV methods to live (`real’) surgical videos has been challenging due to: 1) the lack of kinematic data from the da Vinci surgical system, a key source of information for determining the movement and trajectory of robotic manipulators and suturing needles, and 2) the lack of training data due to the labor-intensive task of segmenting and scoring individual stitches from live videos. To address these challenges, we developed a self-supervised pre-training paradigm whereby sim-to-real generalizable representations are learned without requiring any live kinematics. Our model is based on a masked autoencoder, termed as LiveMAE. We augment live stitches with VR images during pre-training and require LiveMAE to reconstruct images from both domains while also predicting the corresponding kinematics. This process learns a visual-to-kinematic mapping that seeks to locate the positions and orientations of surgical manipulators and needles, deriving “kinematics” from live videos without requiring supervision. With an additional skill-specific finetuning step, LiveMAE surpasses supervised learning approaches across 6 technical skill assessments, ranging from 0.56-0.84 AUC (0.70-0.91 AUPRC), with improvements of 35.8% in AUC for wrist rotation and 8.7% for needle driving skills. Our contributions provide the foundation to deliver personalized feedback to surgeons training in VR and performing live prostatectomy procedures.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43996-4_68

SharedIt: https://rdcu.be/dnwQf

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #3

Please describe the contribution of the paper

1) Automating suturing skill scoring through computer vision methods can reduce the burden on healthcare professionals and enhance educational feedbacks.

2) A self-supervised pre-training paradigm called LiveMAE was developed to learn sim-to-real generalizable representations without requiring live kinematic annotations, which surpassed supervised learning approaches across 6 technical skill assessments.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The strength of this paper is that it proposes a self-supervised pre-training paradigm called LiveMAE, which learns sim-to-real generalizable representations without requiring any live kinematic annotations. This approach can automate suturing skill scoring through computer vision methods, which can significantly reduce the burden on healthcare professionals and enhance the quality and quantity of educational feedbacks.

1) Masked autoencoding is a self-supervised pre-training method for Vision Transformers on images that can learn useful visual representations for downstream tasks. 2) LiveMAE is a proposed method for deriving “kinematics” from live surgical videos to aid in downstream prediction for suturing skill assessment. 3) LiveMAE has three main components: a kinematic decoder, a shared encoder, and an expanded training set. After pre-training, LiveMAE is finetuned for skill assessment by using the pathway from the encoder to the kinematic decoder as a mapping and applying it to live data for classification
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The proposed approach is evaluated on a limited dataset, and further validation on larger and more diverse datasets is necessary to assess its generalizability. Additionally, the proposed approach requires access to both VR and live surgical data, which may not be readily available in all settings.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

To improve reproducibility, additional elaboration is required in the training and testing processes.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

The author performed rigorous evaluations to demonstrate LiveMAE’s efficacy on surgical data collected and labeled across several institutions and surgeons. However, to enhance generalizability, it would be beneficial to include more diverse data in the evaluation process.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

My decision was made based on the above comments.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #1

Please describe the contribution of the paper

This paper proposes a self-supervised pre-training paradigm to bridge the gap between simulation surgical data and clinical surgical videos. The proposed framework, LiveMAE, allows visual-to-kinematic mapping, extracting the poses from live videos without supervision.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

A sim-to-real approach to generate kinematics from live surgical videos. A new Transformer kinematic decoder that converts image features to position and rotations for 10 instruments of interest. An MAE style pretraining for image features for both VR and live surgical images can jointly understand the visual and semantics content across both domains. The data collection is reasonable and complete. Training samples seem to be sufficient with the help of self-supervised pre-training.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Missing the comparison of related works. (sec3) The sim-to-real surgical analysis is not a novel topic proposed by this manuscript. Approaches using convolutional and recurrent models also should be included in the comparison to comprehensively demonstrate the superior performance of the proposed method. Visual artifacts (Fig3) are generated on the image reconstruction results, the lack of explanation makes the finding less convincing. A better investigation or explanation is needed. The lack of details for ConvLSTM, ConvTransformer, the different sizes of parameters, and other unnamed details would make the observations more convincing.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The dataset is not accessible, but they promise to release the code and pre-trained model.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

It would be more attractive if the authors explain Fig3 the more deep insights on reconstruction results (sec3.2) (e.g., the sparkle artifacts, and the blocky patterns on images). Fig3 sponge_1, tube_1 sheet_1 is not clear from the first look. Polishing the message in the figure in a more intuitive way would make the paper more readable.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The entire strategy is attractive which uses the decoded kinematics to facilitate the skill assessment.
The dataset has a sufficient scale and statistics to support the experiment. And experiment design provides the fundamental evidence for proving the idea.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

The paper proposes a self-supervised pre-training paradigm for suturing skill assessment to address the lack of real kinematics data and annotations. Sim-to-real generalizable representations are learnt over different suturing skills.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The addressed issues in the paper are quite practical for the suturing task in real surgery. The simulation data in VR environment is readily available and abundant, while the real surgical data is relatively limited. It is valuable to explore the paradigm of self-supervised learning other than supervised learning in the field of surgical skill assessment.
2. The proposed pre-training method is interesting and has novelty. It learns a task-agnostic representation upon VR and real videos together, which is further decoded for kinematics reconstruction. Large and effective pre-trained models have been widely explored in MIC applications like medical image segmentation, but are still limited in surgical skill assessment.
3. The evaluation of proposed method is sufficient. The benefit of kinematics data as well as the reconstruction performance are validated in addition to the assessment performance.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The proposed method largely relies on the VR data, while the setup of accurate suturing scenario in VR environment can be a challenging issue. The experiment of this paper uses Surgical Science Flex VR simulator, which may be restricted to certain tasks and manipulators.
2. The EASE skills considered in the paper mostly reflect the motion aspect. However, there are other aspects like the final outcome that are also important. For the assessment of these result-oriented aspects, the role of kinematics data may need to be re-evaluated, as it does not contain the semantic and context information which are the strength of videos.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors claim to release the code upon the paper acceptance. Besides, the data collection process and experiment details are explicitly described. To facilitate reproducing the work, it would be valuable to make the VR data and part of real data available (link or email request).
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
1. Since the step of skill assessment finetuning still requires the annotations of real surgical videos, it could be valuable to evaluate how many videos are needed to reach a competitive performance.
2. It is claimed in the pre-training method that the data across different suturing skills is used. In the experiment the performance of reconstruction can be additionally compared to justify its effect on learnt representation.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The concerned issues in this paper are practical and addressing these can have clinical impact on real surgical skill assessment. Besides, the proposed method is novel and can inspire future research of self-supervised learning for surgical skill assessment.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The paper proposes a self-supervised framework for learning sim-to-real generalizable representations, without requiring live kinematic annotations, for the task of suturing skill assessment in robotic surgery. The approach is validated on a multi-institution dataset, demonstrating superior performance compared to supervised baseline approaches. The reviewers highlight the overall approach, the size of the dataset, and the experimental design as the strengths of the paper. The study is well motivated and the paper is well presented.

Feedback from reviewers regarding further discussion of the results (e.g. Fig 3), additional details regarding the methodology (ConvLSTM, ConvTransformer, the different sizes of parameters), and comments regarding the practicality of the VR setup should be incorporated in the final submission.

Author Feedback

N/A

back to top

Self-supervised Sim-to-Real Kinematics Reconstruction for Video-based Assessment of Intraoperative Suturing Skills