Authors

Martin Huber, Sébastien Ourselin, Christos Bergeles, Tom Vercauteren

Abstract

In this work, we investigate laparoscopic camera motion automation through imitation learning from retrospective videos of laparoscopic interventions. A novel method is introduced that learns to augment a surgeon’s behavior in image space through object motion invariant image registration via homographies. Contrary to existing approaches, no geometric assumptions are made and no depth information is necessary, enabling immediate translation to a robotic setup. Deviating from the dominant approach in the literature which consist of following a surgical tool, we do not handcraft the objective and no priors are imposed on the surgical scene, allowing the method to discover unbiased policies. In this new research field, significant improvements are demonstrated over two baselines on the Cholec80 and HeiChole datasets, showcasing an improvement of 47% over camera motion continuation. The method is further shown to indeed predict camera motion correctly on the public motion classification labels of the AutoLaparo dataset. All code is made accessible on GitHub.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43996-4_21

SharedIt: https://rdcu.be/dnwOV

Link to the code repository

https://github.com/RViMLab/homography_imitation_learning

Link to the dataset(s)

http://camma.u-strasbg.fr/datasets

https://www.synapse.org/#!Synapse:syn18824884/wiki/591922

https://autolaparo.github.io/

Reviews

Review #1

Please describe the contribution of the paper

This work proposes a endoscopic camera motion prediction method, in which the predictor is trained by imitating an estimator. The proposed method outperformed the baselines.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Self-supervision for camera motion prediction learning is achieved by harvesting image-motion correspondences using a off-the-shell camera motion estimator.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
The main drawback of this work lies on its contribution. My concerns are as below:
1. The preview horizon’s length M is set as 1. So this prediction is a one-step prediction. Besides, as shown in Fig. 2, around 75% of motions in the dataset are static, which is very easy to predict. Is the one-step prediction meaningful for practical application? How is the feasibility of multi-step prediction? Will the static motion data causes bias in the results?
2. The proposed method is only compared with baseline, i.e. Tayor expansion. The recent camera motion prediction methods for general computer vision task should also be considered in the comparison experiment.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The experiments are conducted on public datasets but the code is not open sourced yet.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
1. The symbols N and M are already present in Sec. 2.1 but defined in Sec. 2.3. Suggest to define N and M as soon as they are mentioned.
2. The definition of “anchor index” is not clear. Delta-uv_n is a matrix, while sigma is a scalar, so how can they be compared?
3. Suggest to depict the model structures of camera motion estimator and predictor with brief subfigure in Fig. 1.
4. It is difficult to understand Fig. 4. Please explain the visualization method.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
1. One-step motion prediction is simple and usually makes little sense.
2. The deep learning based motion predictors are not compared.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper
This paper presents a novel method for predicting laparoscopic camera motion from retrospective videos. The method applies a deep neural network for estimating image-action pairs on laparoscopic datasets, which are then used as labels to train the predictor for camera motion. An importance sampling method is introduced to prepare the training dataset. The authors evaluate the proposed method on three public datasets and compare it to a simple regression baseline.

The main contributions of this paper are as follows:
1. Development of a novel method for predicting laparoscopic camera motion using a deep neural network and importance sampling.
2. Evaluation of the proposed method on three public datasets, demonstrating its effectiveness compared to a simple regression baseline.
3. A comprehensive study of camera motion distribution, highlighting the importance of the proposed sampling method.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Novelty: The proposed method introduces a new approach to train a camera motion predictor, which offers potential advancements in the automating the camera on surgical robots.
2. Well-written: The paper is clear, concise, and effectively communicates the motivation, methodology, and results. This makes it easier for readers to follow the proposed method and understand its implications.
3. Camera motion distribution study: The examination of camera motion distribution helps readers understand the significance of the proposed sampling method, illustrating the need for addressing distribution-related challenges in the dataset.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. Lack of explainability: The paper does not provide sufficient insights into the inner workings of the method (e.g. dose the neural network learn anything to predict the motion? or does it just extrapolate from previous motion?). This may limit the generalizability of the method and its potential application to other surgical procedures or camera systems.
2. Limited comparison: The paper only compares the proposed method to a simple regression baseline. Comparisons to other reinforcement learning algorithms or state-of-the-art techniques would have provided a more comprehensive evaluation, helping to contextualize the results.
3. Potential scalability issues: While the network is trained and tested on individual dataset, it is unclear how it would perform on larger datasets or in unseen surgical scenarios. This may impact its applicability in diverse clinical settings.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors have committed to release their code on GitHub. This will allow other researchers to access, review, and build upon the proposed method.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
1. It is unclear whether the camera motion estimator is pretrained or fine-tuned on the laparoscopic dataset. If applicable, the authors should provide more details to clarify the model’s applied for camera motion estimation.
2. For future work, it would be valuable to investigate if the learned camera motion predictor can generalize to unseen surgical environments, as this would strengthen the method’s real-world applicability. Exploring the method’s performance in different surgical procedures, or with different camera systems, could provide insights into its versatility and robustness.
3. The authors should be cautious when using the term “imitation learning” since the proposed method predicts camera motion with a horizon of 1 and does not learn a motion policy.
4. It would be helpful for the authors to provide an analysis of failure cases and provides insight for the evaluation, if any, to give readers a better understanding of the limitations of the method and the challenges that may arise during its implementation. This could also offer guidance for future improvements and adaptations of the method.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

While there are some weaknesses and areas for improvement in the paper, such as the limited comparison and analysis, these do not outweigh the strengths and potential contributions of the paper. The authors are encouraged to address the concerns raised in the review to further strengthen the paper. Overall, the novelty, robust evaluation, and practical implications of the proposed method make this paper a valuable contribution to the field and warrant its acceptance.
Reviewer confidence

Somewhat confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

The paper proposes an endoscopic camera motion prediction model that is learned from retrospective videos of laparoscopic interventions without manual annotation. The pipeline consists in two main modules: a camera motion estimation stage and a camera prediction stage. Three approaches for online camera motion estimation are evaluated: [11]+ResNet-34 backbone, SURF&RANSAC and LoFTR&RANSAC, and the camera motion is predicted using a ResNet model appropriately tuned for the different datasets. The experimental results are encouraging.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The positive aspects of the paper are: (1) the paper is in general easy to follow; (2) the proposed pipeline is practical, does not require manual effort, and may be relevant to the community interested in imitation learning for laparoscopic interventions; (3) the definition of the motion labels (left, right, up, down, etc.) described in Section 3.1 is compelling; (4) the authors will make the code publicly available upon acceptance; (5) the experimental results in camera motion prediction are encouraging;
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The negative aspects of the paper are: (1) the theoretical contributions are minor: the parametrization for deep homography estimation comes from [6], camera motion estimation is achieved using [11], and camera motion prediction uses a very similar formulation and model as in [11]; (2) since there are no significant theoretical contributions, I would expect some experiments and performance analysis in a (close to) real application scenario.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

I think that the pipeline described in the paper should be reproducible.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

(1) It would be important for the reader to have an introduction or explanation for the reason why deep homography estimation makes sense in a laparoscopic environment, since in reality there is no plane contained in the scene; (2) It would be valuable if the authors would better explain the notion of “strong motions” used in Section 3.2. (3) It would be valuable for the reader if the authors would provide more insights about the suitability of using Taylor expansion as baseline for benchmarking camera motion prediction.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Camera motion prediction in a medical setting is complex and the authors propose an unsupervised pipeline for achieving this objective in a laparoscopic environment. The experimental results are promising, but there are two negative aspects that led me to the current rating: (1) the theoretical contributions are minor, and (2) the pipeline is not tested in a (close to) real application scenario.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The paper proposes a deep learning framework for endoscopic camera motion prediction using retrospective videos of laparoscopic interventions without manual annotation. The reviewers agree that this is an interesting work, and the paper is well written. The novelty of the proposed method is adequate. The performance evaluation should be strengthened by including in the comparison study state-of-the-art camera motion prediction methods and validating on real application scenario datasets. Also, clarifications suggested by the reviewers regarding the methodology, should be addressed.

Author Feedback

N/A

back to top

Deep Homography Prediction for Endoscopic Camera Motion Imitation Learning