Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Ziyi Wang, Bo Lu, Yonghao Long, Fangxun Zhong, Tak-Hong Cheung, Qi Dou, Yunhui Liu

Abstract

Computer-assisted minimally invasive surgery has great potential in benefiting modern operating theatres. The video data streamed from the endoscope provides rich information to support context-awareness for next-generation intelligent surgical systems. To achieve accurate perception and automatic manipulation during the procedure, learning based technique is a promising way, which enables advanced image analysis and scene understanding in recent years. However, learning such models highly relies on large-scale, high-quality, and multi-task labelled data. This is currently a bottleneck for the topic, as available public dataset is still extremely limited in the field of CAI. In this paper, we present and release the first integrated dataset (named AutoLaparo) with multiple image-based perception tasks to facilitate learning-based automation in hysterectomy surgery. Our AutoLaparo dataset is developed based on full-length videos of entire hysterectomy procedures. Specifically, three different yet highly correlated tasks are formulated in the dataset, including surgical workflow recognition, laparoscope motion prediction, and instrument and key anatomy segmentation. In addition, we provide experimental results with state-of-the-art models as reference benchmarks for further model developments and evaluations on this dataset. The dataset is available at \url{https://autolaparo.github.io}.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16449-1_46

SharedIt: https://rdcu.be/cVRXl

Link to the code repository

N/A

Link to the dataset(s)

https://autolaparo.github.io


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents and describes a new data set for Laparoscopic Hysterectomy. The data set includes 1388 minutes of surgical activity. It is aimed to be used for research in multiple areas including workflow analysis, laparoscope motion prediction and instrument and anatomy segmentation. The authors reported results from applying several machine learning methods on their data set for each of the areas mentioned above.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    There is a bad need for more publicly available data sets for applications such as automation in surgery. The main strength in this paper is that it aims at filling this gap by providing a data set from real surgeries.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    In my view, the main weakness in this paper is that the authors made many assumptions regarding their data set that need to be properly justified.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    There are some improvements that need to be made on the reproducibility aspect such as:

    1. Sharing the data set or at very least a sample of it, especially since this is the data set is the main contribution of this paper.
    2. A clear explanation of some of the assumptions made with respect to the data set is needed, especially the parts of the data set on laparoscope motion prediction and anatomy segmentation.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. I strongly recommend renaming the dataset. The current name “AutoLap” is the trade mark for an autonomous camera system for laparoscopic surgery, see: https://www.youtube.com/watch?v=_fXPgRgTEAY&ab_channel=MST-MedicalSurgeryTechnologies
    2. The term “Image-guided” implies that you provide data from imaging modalities such as ultrasound. That is why I suggest the authors use the term “vision-based” instead throughout the paper.
    3. On the title: Is the use of this dataset restricted to just automation? If not, I would change the term “automation” to be “activity”. This should widen the possible use cases by researchers in areas other than automation as well.
    4. It is really difficult to reason how this dataset is useful without actually seeing the dataset itself (or at least a sample of it). Would it be at all possible to find some way to share it so that we can also review it?
    5. In the introduction: the authors wrote: “To enhance the surgical scene understanding towards image-guided automation, the most promising solution is to rely on learning-based methods”. I would say “one promising solution” instead of “the most promising”. The latter is a very strong statement that need substantial and clear evidence.
    6. I find the word “task” very confusing in the context of this paper, with respect to the use of this term in automation for surgery papers. In these papers, the term tasks or subtasks refer to tasks such as suturing, knot tying and so on. To avoid this confusion, I recommend that the authors use another term.
    7. On the data set collection: Were all the videos coming from one surgeon’s practice or from multiple surgeons? If it is the latter, did the authors account for the individual preferences of surgeons (such as their preferences on moving the laparoscope ) by any means in their dataset and models?
    8. On the data set collection: Did the imaging platform used provide 3D view of the scene? And if so, which video channel was eventually used to record the video (the left or right channel)? Please add this to the description of the dataset.
    9. On the data set collection: Were there cases of disagreement between the annotations of the gynecologist and specialist? If so, how did the authors handled these cases?
    10. The part of the data set on the laparoscope motion can be significantly improved to avoid confusing the reader. This is mainly because of the design decisions that the authors made in this part that are not justified such as: a. using part of the data set and not the entire data set for motion prediction. b. Discretization of the motion of the laparoscope into 6 motion types instead of treating the motion as continuous variable. c. The choice of the types of motion which seems to miss cases such as moving the camera diagonally. Currently, I guess this can only happen by discretizing the diagonal motion into two types of motion (e.g., up and then left). d. Setting T to be 5 seconds
    11. What not considering the entire data set for the part on anatomy segmentation as well?
    12. What is the value of presenting the results in table 4 especially since the accuracy is not very high?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The need for having more publicly available data set justifies my leaning towards accepting the paper. The several questions I outline above justify why it is a “weak” accept.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #4

  • Please describe the contribution of the paper
    1. This work contributes an integrated dataset AutoLap with multi-tasks to facilitate learning-based automation in hysterectomy surgery.

    2. A series of experiments are carried out on AutoLap with SOTA models.

    3. The dataset will be released to public after the paper is published.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This dataset is collected from real surgeries in hospital.

    2. The single dataset works for three tasks: workflow recognition, laparoscope motion prediction, and scene segmentation.

    3. SOTA models are evaluated on the dataset to present the benchmark references.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The major weakness is that the value of “multi-task” dataset is not fully presented. There have been plenty of surgical vision datasets. The key feature of AutoLap is multi-task. So, what is the relationship between the three tasks? How to leverage the correlation over tasks? How is the feasibility? Without deep analysis and discussion about “multi-task”, the value and innovativeness of another new dataset are limited.

    2. The AutoLap dataset should be positioned in the surgical research area. A table is suggested to compare the features of AutoLap and existing datasets. See https://arxiv.org/pdf/2011.02284.pdf.

    3. The sample number for segmentation task is sufficient. But are the sample numbers for the other two tasks sufficient for training DNN?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The dataset will be released upon paper publication.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The authors should revise the paper according to the above weaknesses.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This work provides a novel dataset featured by “multi-task”. However, I expect more contents to expose the value of “multi-task”. At least the authors should tell why the three tasks are coupled and how to practically benefit the surgical automation as a whole.

  • Number of papers in your stack

    3

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    4

  • [Post rebuttal] Please justify your decision

    The dataset is large-scale and high-quality. But its “multi-task” feature is underused in the current version. For the current literature, a novel dataset with multiple tasks that are coupled in learning/inference is much more valuable than a new dataset that is applicable for multiple individual tasks. The authors’ response in the rebuttal is not convincing. For example, “tool usage information can enhance the phase recognition”, but how to enhance? “the phase and segmentation results provide rich information of surgical scene that can help predict the laparoscope motion”, but what is the rich information?



Review #5

  • Please describe the contribution of the paper

    This paper presents a dataset for facilitating ML-based approach development laparoscopic video understanding. The dataset contains sub-datasets designed for three different tasks including workflow recognition, laparoscope motion prediction and instrument/anatomy segmentation. It is stated that this dataset is provided to encourage advances towards surgical automation. Example uses of the dataset were demonstrated by benchmarking multiple state-of-the-art DL approaches. The authors claimed that the dataset will be shared along with the publication of the work.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is well-written, its structure is clear and easy for readers to follow. The topic of this paper is closely relevant to MICCAI interests, in particular focusing on fostering the CAI research field.

    2. This paper aims at presenting a newly annotated laparoscopic video dataset focusing on three important sub-tasks towards surgical automation. The details of the dataset are provided and it does seem straightforward to use this dataset upon its release.

    3. Benchmarking examples are demonstrated. Several state-of-the-art approaches have been tested on the dataset, and this further proves the usability of the dataset.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    “Towards surgical automation” has been stated as the motivation of providing this dataset. However, the videos used for this dataset were collected from non-robotic laparoscopic surgeries. Given that robotics is an essential component for automation, it is not clear that how evaluating the ML approaches on this dataset can be directly transferred to robotic video datasets. As for robotic surgical tools would appear differently in the videos compared to conventional laparoscopic tools and the generalizability of ML approaches is still doubtful, the value of this dataset for surgical automation would still be questionable.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors stated that the dataset will be released upon the publication of the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    This reviewer is generally pleased to see a paper that contributes a comprehensive dataset to the CAI field. As the paper is well-structured and provides details/examples of how to use the proposed dataset for benchmarking, this reviewer would think this paper is acceptable and only have a couple of comments listed as below:

    1. Please mention the differences between laparoscopic videos vs robotic laparoscopic videos (tool appearances, etc), this would help the readers to understand that how learning on such a dataset could be transferred to “towards surgical automation”.

    2. It would be good if the authors can also provides insights how the size of the this particular dataset is correlated to the success of evaluating ML approaches? Is the size of each sub-dataset fair enough for judging the model performance?

    3. It was nice that the authors have provided the benchmarking results of the several existing frameworks. As the dataset is provided with clinical relevance, it would be better that if the author can discuss what are the clinically-acceptable accuracies in these sub-tasks? One common question for most of these existing datasets is that after a researcher evaluate his/her approach on the dataset, how would the researcher know if the approach is doing good enough for the tasks? The readers would appreciate if the authors could provide more discussion on this.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper presents a newly annotated dataset to the CAI field. The paper is well-written, and the description and details of the dataset are adequate. Several state-of-the-art approaches have been tested on the dataset as example uses. Overall, this paper is acceptable.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper presents a new large-scale integrated dataset with multiple image-based perception tasks to facilitate surgical visual perception and learning-based automation in hysterectomy surgery. The dataset (AutoLap) is designed around three tasks: surgical workflow recognition, laparoscope motion prediction, and instrument and key anatomy segmentation. The paper is well-written, experimental results with SOTA models are included as reference benchmarks, and the authors intend to release the dataset, models, and code. The main criticisms of the work are related to the main contributions of the paper (including the multi-task scope, and motivations towards surgical automation), positioning and comparison with respect to existing datasets in the surgical research field, questions around the size & sample numbers of the dataset for some of the tasks, and lack of justification for certain assumptions regarding the data.

    The following points should be addressed in the rebuttal:

    • Clarification on the main contributions of the paper regarding the value of the multi-task dataset, the correlation between the three tasks and how they can be leveraged
    • Justification for details of the dataset including whether sample numbers for each of the tasks (particularly the surgical workflow recognition and laparoscope motion prediction tasks) are sufficient.
    • Further clarification regarding the motivations of this dataset towards surgical automation, given that data is from non-robotic surgeries.
  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5




Author Feedback

Thanks for AC’s favorable comments on the merits of our proposed dataset, which are also highlighted by all three reviewers. Moreover, reviewers seem quite positive with this “new large-scale integrated dataset” that “fills the gap in surgical automation application by providing a dataset from real surgeries” and “proves its usability”, as well as “fosters the CAI research field”.

1) Contribution. In this paper, we provide a new dataset with laparoscopic hysterectomy videos towards multi-task and surgical automation. To the best of our knowledge, the number of public in-vivo laparoscopic dataset with full-length surgical video is less than 10, some are listed as below:

  • m2cai16-workflow: 41 videos, 25 hours with phase label;
  • Cholec80: 80 videos with phase and instrument type;
  • EndoVis-workflow: 33 videos with phase label;
  • EndoVis-WorkflowAndSkill: 33 videos, 22 hours with phase, action, instrument and skill label;
  • Heisurf: add segmentation label based on EndoVis-WorkflowAndSkill;
  • The proposed dataset: 21 videos, 23+ hours, 2000K+ images, with phase, laparoscope motion, and instrument and key anatomy segmentation, reaching a large-scale dataset with high-quality that is comparable with the above datasets. [Ref] Surgical Data Science - from Concepts toward Clinical Translation. Maier-Hein et at, 2021.

2) Reply to R1:

  • For data collection, we carefully select clips with typical motions based on the same criteria from different videos regardless of surgeons’ individual performance. Six basic motion types are labeled and can be combined for complex motion. A 5-second video contains adequate information for prediction and performance can be further improved with our multiple annotations.
  • For data annotation, results are proofread to ensure accuracy and consistency. Results of the senior surgeon will be adopted in case of disagreement. Segmentation of 1800 images with nearly 6000 annotations is sufficient for this task.
  • We will rename the dataset and modify the description based on comments 1-8.

3) Reply to R4:

  • For the “multi-task” issue, three different tasks are formulated in the dataset and they are also highly-correlated by applying a three-tier annotation process at video-, clip-, and frame-level (as shown in Fig.1). Specifically, tool usage information can enhance the phase recognition and vice versa, so these two labels are mutually beneficial for Task 1 and Task 3. Besides, the phase and segmentation results provide rich information of surgical scene that can help predict the laparoscope motion in Task 2.
  • For the size of the dataset, compared with other datasets for phase recognition, such as m2cai16-workflow with 300K+ images per category and Cataract-101 with 100K+ images per category, the proposed dataset with 300K+ images per category is applicable. For laparoscope motion prediction data, it contains 300 clips with 75K images for 7 categories. Note that besides raw image data, frame-wise phase labels and pixel-wise segmentation annotations are also provided as supplements, sufficient to enhance the learning process.

4) Reply to R5:

  • For the “towards surgical automation” issue, the dataset is not proposed for robotic surgery, but for the platform using a robot arm to hold and control the laparoscope, as it is more flexible and easier to develop. Data and labels of laparoscope motion are provided so the model can be trained to learn the motion mode from real videos and then transfer this knowledge to further robot motion control.
  • For the clinically-acceptable performance, as advised by the clinician co-author, the higher the success rate of the model (>80%), the better it is for real surgery applications.

Overall, all major concerns can be easily clarified in the final version. As reviewers indicated, we believe that our proposed dataset which is collected from real surgeries and integrates multi-tasks, can bring important value to the CAI research field and also surgical automation application.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    A publicly available dataset from real procedures targeted at surgical visual perception tasks and automation is of interest to the CAI community, and the paper is well-written with thorough experimental results with SOTA models. The rebuttal and discussions address the concerns of the reviewers around the value of the “multi-task” dataset. Further clarification regarding the value towards surgical automation should be included in the final version.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper certainly has a lot of merits and there is a lot of value in the dataset. The authors have addressed the reviewers’ concerns, who generally have positive feedback on the paper. I recommend accepting the paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    NR



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    I think the authors have done a good job in addressing the concerns of the reviewers. Furthermore, I agree that the release of this dataset to the scientific community is a contribution in and of itself. Given this I lean towards accepting the paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    NR



back to top