Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Ege Özsoy, Evin Pınar Örnek, Ulrich Eck, Tobias Czempiel, Federico Tombari, Nassir Navab

Abstract

Surgical procedures are conducted in highly complex operating rooms (OR), comprising different actors, devices, and interactions. To date, only medically trained human experts are capable of understanding all the links and interactions in such a demanding environment. This paper aims to bring the community one step closer to automated, holistic and semantic understanding and modeling of OR domain. Towards this goal, for the first time, we propose using semantic scene graphs (SSG) to describe and summarize the surgical scene. The nodes of the scene graphs represent different actors and objects in the room, such as medical staff, patients, and medical equipment, whereas edges are the relationships between them. To validate the possibilities of the proposed representation, we create the first publicly available 4D surgical SSG dataset, 4D-OR, containing ten simulated total knee replacement surgeries recorded with six RGB-D sensors in a realistic OR simulation center. 4D-OR includes 6734 frames and is richly annotated with SSGs, human and object poses, and clinical roles. We propose an end-to-end neural network-based SSG generation pipeline, with a rate of success of 0.75 macro F1, indeed being able to infer semantic reasoning in the OR. We further demonstrate the representation power of our scene graphs by using it for the problem of clinical role prediction, where we achieve 0.85 macro F1. The code and dataset will be made available upon acceptance.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16449-1_45

SharedIt: https://rdcu.be/cVRXk

Link to the code repository

https://github.com/egeozsoy/4D-OR

Link to the dataset(s)

https://github.com/egeozsoy/4D-OR

Reviews

Review #1

Please describe the contribution of the paper

This paper proposes to represent the OR via semantic scene graphs to achieve a holistic understanding. A new 4D-OR dataset is constructed and will be made public. Several state-of-the-art computer vision methods are pipelined and tested on the new dataset.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. A new 4D-OR dataset is built and will be released, which will benefit the community.
2. The idea of using semantic scene graph for holistic OR understanding is significant, which has the potential to make an impact.
3. The paper achieves scene graph generation by assembling several state-of-the-art computer vision methods into a reasonable and effective pipeline.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. As for the clinical role experiment, I wonder why the authors did not embed the role into the scene graph where the relations will be like “Assistant surgeon assist head surgeon” instead of “human1 assist human2”. What is the benefit of a role-agnostic scene graph?
2. The human poses and object boxes in the new dataset are automatically generated by the Kinetic SDK. What is the accuracy of these generated annotation? How does this source of error affect the evaluation of this paper?
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Although the authors promise to release the codes, the reproducibility of this paper is at a high risk due to the huge pipeline of existing methods and many ad-hoc components. I would recommend the authors package all the pipeline components together into a for example Docker container.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

Please see weaknesses.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper is a good proof-of-concept of scene graph in the surgical field. Besides, a new dataset will be made public. Therefore, I recommend an “accept”.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

Not Answered
[Post rebuttal] Please justify your decision

Not Answered

Review #2

Please describe the contribution of the paper

This paper presents a neural network-based approach to generate semantic scene graphs of the activities happening in the OR during a surgical procedure. The principal goal of the proposed approach is to accurately predict the role of every human present in the OR. A quantitative evaluation is performed on a dataset composed of simulated total knee replacement surgeries recorded with six RGB-D cameras, with annotations of human and object poses, SSG labels and clinical roles. The authors propose to make this dataset public, which can be a big contribution to the field.

A complex methodology is presented to process the images and point clouds, and also to automatically obtain human and object pose information using off-the-shelf approaches, and then generate an SSG to be able to predict the role of everyone in the scene. The approach achieves good performances especially for patient and head surgeon role prediction.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper is clear and well-written. The methodology is clearly explained, and its explanation makes for most of the content of the paper. The proposed method appears to be reproducible thanks to all the details provided and that is a good thing.

This paper can contribute to future context-aware systems with an automatic and holistic understanding of the activities happening in the OR. A rich dataset is introduced, which can be used by the community for developing and evaluating similar approaches and go one step closer towards a smart OR. Indeed, the presented dataset is complex and is challenging to obtain. Synchronizing and calibrating six RGB-D cameras, with all the post-processing needed, is not an easy task. The dataset also provides several kinds of annotations. The quantitative results provided will represent a baseline for future papers.

I also find this paper interesting because MICCAI has already seen similar works but rather dealing with images coming from minimally invasive surgeries (such as laparoscopic videos or robot-assisted surgeries). Yet this paper deals with recordings from a conventional orthopedic procedure from ceiling cameras. This is promising because today around 90% of total knee replacement surgeries are still performed using conventional instrumentation (no navigation, cameras, or robot present). Hence, this paper shows that conventional approaches can also benefit from smart context-aware systems in the near future.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The paper lacks details about how realistic the surgeries included in the dataset actually are. This is important for the reader to get an understanding of how challenging the clinical role prediction task is from the data. Also, since the surgeries are simulated, it is not clear if the most important parts of the knee replacement procedure are actually included in the data, namely everything that happens after the knee incision (placement of cutting guides, bone resections, knee joint assessment, trial implants, cement preparation). Furthermore, how different are the 10 recorded procedures? Knee replacement surgery can be performed through different clinical approaches depending on the desired alignment (mechanical or kinematic), the type of implant, etc. Are all these differences represented on the dataset? Or is it the same surgical approach repeated ten times? Are the actors in the videos different from one video to the other? Has the OR changed somehow from one video to the other? Moreover, the size of the test set seems small for an evaluation of the performances of the proposed approach, especially since we do not know how different each of the surgeries are. Do you believe your approach would generalize well to variations in the surgical workflow?

The evaluation metrics and results are difficult to interpret and understand. More explanation should be provided. What is the purpose of the ablation studies that are presented?

The discussion section is short and as a reader you feel like more can be discussed about this work. Especially about how the approach would translate to a real clinical setting. No recommendations for this are provided or about how would this approach be implemented in a clinical setting.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Details about implementation, all the off-the-shelf approaches, hyperparameters, data split are provided. However, the paper lacks explanations about how clinically realistic the simulated surgeries are.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
Please find below a set of comments/questions that I believe can help improve the paper if addressed:
- Did you consider having additional data as input besides images and point clouds? I am talking about sound or signals from electronic equipment (e.g. oscillating saw or drill). It seems complicated to imagine future OR to have six RGB-D cameras installed.
- Regarding the RGB-D cameras: how was the placement decided? How is the calibration between the cameras done, I mean the extrinsic parameters enabling to fuse all point clouds properly? More details about this could be useful to the reader.
- Could you give more information about all pre-processing steps for the point clouds. The fused point cloud seems noisy in the figures. Did you consider working directly with the depth images instead? Could you discuss how much ambient illumination can be an issue for your approach? The scialytic lamps were always off during the recordings?
- As mentioned before, I think the paper could benefit from more details about data. Knee replacement surgeries can have different workflows and surgical gestures depending on many things such as surgical planning (mechanical or kinematic alignment), type of implant, surgical technique. Were all simulated surgeries the same? Is the ancillary used (such as intramedullary rod, cutting guides, pins…) also included in the simulation? It seems to me that the specific parts of a knee replacement surgery happen when the knee is exposed, and your dataset rather considers the activities happening around the OR but not directly the surgical gestures.
- How would this method be applied in a clinical setting? How do you envision its integration to routine clinical practice?
- Could you provide the list of relationships and activities considered? I do not see things like bone resection, implant placement, cement …. Also providing the list of objects considered (only large medical equipment?) can be interesting.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

4
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper brings interesting contributions, but it seems more as a proof of concept. Hence I recommend rejection and resubmission for a future MICCAI. The dataset used for evaluation does not seem clinically realistic (at least from the scarce details provided in the paper). The evaluation sections and discussions seem shallow and also lack information on how this approach could translate to clinical practice.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

3
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

5
[Post rebuttal] Please justify your decision

The rebuttal responds to my questions about how clinically realistic the generated dataset is, and about translation to clinical practice. We expect this information to also be added to a revised version of the paper.

Review #3

Please describe the contribution of the paper

The paper tackled the problem of understanding surgical scenes. It proposed semantic scene graph to construct a holistic knowledge of the operating room. The paper generated a multi-view dataset and relied on state of the art model to detect 3D human and object poses. These were then used to build a scene graph and predict roles.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Focused on a problem of holistic scene understand in the OR that can benefit many applications
- Generated a novel dataset with annotation
- Making the dataset publicly available
- Achieving high performance on detecting roles and relationship
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Limited technical contributions
- It would be interesting to get the performance by perturbing pose estimation prediction.
- It would be interesting to see on camera view layout
- the paper does not include any information on the calibration. It would be interesting to provide some information on how the cameras are calibrated.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The author committed to release the code upon acceptance.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
- Can the author provide any information on depth interference among the views.
- One can imagine that the accuracy of the model decreases on non-simulated cases. I think it would be interesting to see the affect of perturbed 3D poses on the performance of the model.
- Considering the model have achieved high performance, do the authors plan to make the dataset more challenging by adding either extra data or tasks?
- The model had achieved almost perfect prediction for patient and head surgeon. Can the author comment if this could be due to the fact that their 3D location are very similar across the data. Have the authors simulated both left and right knee replacement to get different location for the head surgeon.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I think the paper would be stronger if the effect of data perturbation is analyzed more and also more information on scenarios captured for this dataset.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

2
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

5
[Post rebuttal] Please justify your decision

I think the paper would more interesting if the authors include discussion on the effect of perturbation and also comment on the model bias on 3D locations of different roles.

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The paper presents an interesting approach for activity representation in the OR and generates a relevant dataset. R2 raised concerns about how clinically realistic the generated dataset is. R3 suggested to evaluate the effect of data perturbation on the performance of the proposed model. The explanation of the generated dataset should be enhanced by incorporating more details and addressing the points of reviewers R2 and R3. More details should be included in the evaluation and discussion sections. The authors should also comment on how the proposed approach could translate to clinical practice.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

4

Author Feedback

We thank all reviewers for their insightful and valuable comments. All reviewers acknowledge that our work introduces a significant(R1) concept via surgical semantic scene graph(SSG) based modeling, contributes(R2) toward smart and holistic OR understanding, and provides a novel and publicly available dataset(R1,R2,R3) to encourage and enable the community to do further research on surgical SSGs.

Regarding the dataset details(R2,R3), all recordings in 4D-OR follow the major workflow steps of the total knee replacement surgery, such as bone resections, trial implants, and cement preparation, defined with the guidance of clinical experts. The dataset is captured in a simulation center of a hospital, using real surgical equipment with different actors and surgical variations, such as duration, order, and repetition of individual phases. Following standard machine learning literature, we propose a 6-2-2 train/val/test split of the 10 recordings in our dataset. We have provided these details in the supplementary material but will include further insights also in the paper.

On clinical realism(R2), our dataset is the first fully semantically and geometrically annotated 3D and temporal OR dataset. We believe that any simulated dataset would not cover the full variability and complexity of a clinical knee replacement surgery. However, as mentioned above, we closely followed the clinical workflow and introduced different sets of variability to create the first dataset representing a subset of the real-world challenges. In this conference paper, one of our main objectives in generating and offering this dataset is to support the SDS community working on the difficult task of enabling human-like, smart, and holistic understanding and modeling of surgeries. Clinical translation(R2) includes in addition to data privacy(paper page 8, Sec.4), aspects like infrastructure, technical equipment, and data availability. A system capable of understanding the surgery in a holistic way as we propose builds the foundation for an array of new products and safety features including optimized solutions for the surgery room of the future.

We have also explored the behavior of our method under data perturbations(R3), by randomly moving the 3D human joints or object bounding boxes in x,y,z directions within a predefined range. For slight perturbations (range:±1cm/F1:0.76), the F1 results stay identical, whereas for moderate (range:±2.5cm/F1:0.73) and high perturbations (range:±10cm/F1:0.55) we see a deterioration in the performance. If R3 and MR find it valuable, we will discuss these observations within the final version.

Furthermore, a role-aware SSG(R1) requires first predicting the roles of the human in the scene. Relying only on visual features for role prediction is challenging due to the similar appearances of the staff. SSGs can capture and represent the semantically and temporally rich interactions to generate an accurate clinical role prediction. This can be fed back into the graph for complete semantic understanding, which we show in our supplementary video. Finally, we use the standard metrics for the SSG prediction evaluation such as Precision, Recall, and F1 (paper page 6, Sec. 3). We placed six Kinect Azure Cameras in the OR for maximum visibility of the scene (Fig. 2). In previous studies, the Kinect system accuracy was reported to have an average error of less than 0.5 mm[1]. The extrinsic camera calibration is done by a fiducial marker and iterative closest point algorithm. We will release full pre-processing and pipeline codes upon acceptance.

Overall, as recognized by all reviewers, we believe our work will add a significant value to the surgical data science community by introducing a novel holistic OR modeling approach through surgical SSGs and providing a new dataset, 4D-OR.

[1]Tölgyessy, M. et al. Skeleton Tracking Accuracy and Precision Evaluation of Kinect V1, Kinect V2, and the Azure Kinect. Appl. Sci. 2021, 11, 5756.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors responded adequately to the reviewers’ comments and should revise the paper accordingly. I recommend acceptance of the paper as it presents an interesting approach for activity representation.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

2

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The paper presents a new 4D operation room scene understanding dataset, and a new method for modelling/analyzing surgical procedures through semantic scene graphs. The paper is well-motivated and well-written, the approach has novelty, and the new dataset is of interest to the CAI community. The rebuttal addresses the reviewer’s concerns regarding clinical translation and clinical realism of the dataset, and this information should be included in the final submission as well.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

2

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

Major concerns have been addressed. The authors should incorporate the feedback from reviewers and justification provided in the rebuttal in the camera ready.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

7

back to top

4D-OR: Semantic Scene Graphs for OR Domain Modeling