List of Papers By topics Author List
Paper Info | Reviews | Meta-review | Author Feedback | Post-Rebuttal Meta-reviews |
Authors
Benjamin D. Killeen, Han Zhang, Jan Mangulabnan, Mehran Armand, Russell H. Taylor, Greg Osgood, Mathias Unberath
Abstract
Surgical phase recognition (SPR) is a crucial element in the digital transformation of the modern operating theater. While SPR based on video sources is well-established, incorporation of interventional X-ray sequences has not yet been explored. This paper presents Pelphix, a first approach to SPR for X-ray-guided percutaneous pelvic fracture fixation, which models the procedure at four levels of granularity – corridor, activity, view, and frame value – simulating the pelvic fracture fixation workflow as a Markov process to provide fully annotated training data.
Using added supervision from detection of bony corridors, tools, and anatomy, we learn image representations that are fed into a transformer model to regress surgical phases at the four granularity levels. Our approach demonstrates the feasibility of X-ray-based SPR, achieving an average accuracy of 99.2% on simulated sequences and 71.7% in cadaver across all granularity levels, with up to 84% accuracy for the target corridor in real data. This work constitutes the first step toward SPR for the X-ray domain, establishing an approach to categorizing phases in X-ray-guided surgery, simulating realistic image sequences to enable machine learning model development, and demonstrating that this approach is feasible for the analysis of real procedures. As X-ray-based SPR continues to mature, it will benefit procedures in orthopedic surgery, angiography, and interventional radiology by equipping intelligent surgical systems with situational awareness in the operating room.
Link to paper
DOI: https://doi.org/10.1007/978-3-031-43996-4_13
SharedIt: https://rdcu.be/dnwON
Link to the code repository
https://github.com/benjamindkilleen/pelphix
Link to the dataset(s)
https://github.com/benjamindkilleen/pelphix
Reviews
Review #3
- Please describe the contribution of the paper
The authors propose to utilize features extracted from the U-Net encoder to train a transformer model to recognize surgical phases at the four granularity. To evaluate this X-ray-based Surgical Phase Recognition method, the authors conduct experiments not only on simulated sequences data but also on real data.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
(1) Surgical Phase Recognition in the X-ray domain seems to be novel, this might be an interesting research direction.
(2) The method the authors propose seems can solve surgical workflow recognition together with segmentation and landmark detection tasks. The method also enables sim-to-real transfer. The system seems to be novel.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
(1) The paper tried to cover many aspects, the main topic for the paper is Surgical Phase Recognition. From the results section, many Surgical Phase Recognition results are not in the main text but are put in the supplementary section. May I ask if there is a better way to arrange the text that highlights the main topic of this work?
(2) May I ask the reason why the dataset is divided into 327 for training and 10 for validation, please? It seems there is not enough data for the validation.
- Please rate the clarity and organization of this paper
Satisfactory
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
The authors state they will make their code and data available. I do not see any problem with the reproducibility of the work.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
Not enough ablation studies are conducted on the Surgical Phase Recognition topic itself. (1) For future work, maybe also compare Transformer with TCN and LSTM. (2) How much improvement when learning Surgical Phase Recognition together with segmentation and landmark detection tasks compared to learning the Surgical Phase Recognition task only by itself?
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
5
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The authors select Surgical Phase Recognition in the X-ray domain as their research topic in this paper, this new research direction seems to be novel. I hope the authors can highlight the Surgical Phase Recognition topic better in the paper in their revision.
- Reviewer confidence
Confident but not absolutely certain
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Review #2
- Please describe the contribution of the paper
This paper provides a surgical phase recognition framework based on intraoperative C-arm X-rays. At the heart of this algorithm, a transformer-based network is used to identify clinically relevant phase characteristics. Segmentation and landmark priors are used in parallel in hopes of assisting the phase recognition task.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The paper is of very high quality in terms of content presentation and structure. The writing quality is very good and the flow of information is well articulated. The introduction section provides sufficient background into the application at hand and has a suitable overview of the existing methods (that are primarily based on RGB input feed). The gap in the literature is well highlighted. The proposed CNN architecture is up for the task if sufficient data is provided. A data synthesis (using DRR techniques) is used to mitigate data availability issues. promising performance is achieved for synthetic X-rays while the potential of the aforementioned approach is shown when faced with real X-rays. The proposed transformer network and the assistive models (heatmap detection for landmarks and segmentation for anatomical areas) have been entirely trained on synthetic images. There is an evident drop in performance while real X-rays are used. While this is acknowledged by the authors, no proper domain adaptation process is presented in the context of the herein study. While the authors have provided explanations as to justifying the drop in performance, no visualizations are provided in the main paper to back up such claims. I strongly encourage the authors to include such visualizations to concretely demonstrate the clinical conditions under which the network succeeds/fails when dealing with real X-rays. The authors should implicitly justify the necessity use of segmentation and heatmap branches in their algorithm by demonstrating the SPR accuracy with and without such assistive branches.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
The proposed transformer network and the assistive models (heatmap detection for landmarks and segmentation for anatomical areas) have been entirely trained on synthetic images. There is an evident drop in performance while real X-rays are used. While this is acknowledged by the authors, no proper domain adaptation process is presented in the context of the herein study. While the authors have provided explanations as to justifying the drop in performance, no visualizations are provided in the main paper to back up such claims. I strongly encourage the authors to include such visualizations to concretely demonstrate the clinical conditions under which the network succeeds/fails when dealing with real X-rays. The authors should implicitly justify the necessity use of segmentation and heatmap branches in their algorithm by demonstrating the SPR accuracy with and without such assistive branches.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
The authors have stated that their code and data will be made available, but no timeframe is provided.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
To summarize my comments above, I recommend the authors to:
- Include an ablation study showing the benefit of segmentation and landmark detection branches of their network.
- Come up with a visualization process to highlight the reason for performance gap between the real and synthetic input images. (showing where exactly the network fails)
- Similar to figure 4, results of the SPR method should also be shown when DRR images are used for input.
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
5
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The technical contribution of the presented paper is evident while the authors would have to apply some improvements to revamp the clarity of the manuscript.
- Reviewer confidence
Very confident
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Review #1
- Please describe the contribution of the paper
This paper presents surgical workflow recognition using X-ray images as input. Such a method could be used for surgical assistance or training systems. The methods were evaluated on simulated images, and cadaver images.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Interesting new approach to surgical workflow recognition
- Innovative use of recent deep learning-based methods
- Evaluation metrics relevant to clinical applications
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The methods are difficult to follow due to higher complexity than what can be clearly described in a few pages.
- Please rate the clarity and organization of this paper
Very Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
The authors have made a good effort to describe their methods in detail. But without source code, such a complex methodology is impossible to reproduce. I hope I didn’t miss anything because the authors answered all reproducibility questions with a “yes”. However, I could not find any references to either source code or data.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
- I highly recommend the authors to disclose source code or at least test data samples so others could benefit more from this publication.
- A discussion of limitations is missing from the paper. Please share some of the difficulties you are still facing in this project. What causes higher errors in certain test datasets? How much do you think the methods are generalizable?
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
6
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
It is an interesting paper, loaded with modern methodology that enables results that were probably not possible to achieve earlier. The methods have multiple potential clinical applications.
- Reviewer confidence
Confident but not absolutely certain
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
N/A
- [Post rebuttal] Please justify your decision
N/A
Primary Meta-Review
- Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
The paper presents a novel framework (Pelphix) for incorporating interventional X-ray sequences into surgical phase recognition. Focused on X-ray guided percutaneous pelvic fixation, the approach models the procedure at four levels of granularity and simulates the workflow as a markov process to provide fully annotated training data for a transformer model. The reviewers highlight the novelty of the proposed approach, and clinically relevant metrics and validation experiments as the strengths of the paper. The topic is of interest to the community, the clinical application is well-motivated, and the paper is well written and presented.
Feedback from the reviewers regarding further discussion on limitations and sources of error (including performance limitations when dealing with real X-rays), clarification on the methodology including justification for use of segmentation and landmark detection (and associated improvements), and comment on the training/validation split should be incorporated in the final submission.
Author Feedback
Thank you to the reviewers (R1, R2, R3, Meta-R) for their attention to, feedback for, and recommendation of our work. All three reviewers highlighted the novelty of the paper, which incorporates interventional X-ray images into surgical phase recognition (SPR). We summarize the constructive criticisms below:
- There is a notable sim-to-real gap when validating our method on cadaveric sequences (R2, R3, Meta-R). It is important to note that in the case of SPR, failed sim-to-real transfer may arise due to failure to generalize image or temporal features. Given that training sequences are simulated in a Markov fashion, we strongly suspect that the latter is to blame and will discuss future work which may further explore this question. Moreover, in our final submission, we will include exemplary cases with real images from our cadaver study to highligh the successes, limitations, and failure modes of our method.
- As R2, R3, Meta-R observe, the benefits of landmark and segmentation branches are not demonstrated empirically. An ablation study will explore the effect of removing these branches from the network, as time permits, in our final submission.
- Compared to real image datasets, where training and validation data may be easily split, we generate multiple image sequences from a single CT scan. To avoid training and testing on sequences generated from the same CT, we reserve a fixed number of CT scans for generating validation data, in this case 10. Subsequent processing of training data resulted in the imbalance noted by R3 and Meta-R, which we will clarify in our final submission.
- R1 and R2 note that releasing source code for image sequence simulation is necessary given the complexity of our method. To clarify, we will release source code and simulated images upon publication at MICCAI, subject to the data use agreement from NMDID. This will enable readers to re-train based on our exact sequences or to re-generate sequences using our code and evaluate on the cadaveric images collected.
Additional points that we will address in our final submission as space permits include:
- R3 notes that many results relevant to SPR are relegated to the supplementary material. We will return these to the main paper.
- Although neither domain-adversarial training nor CycleGAN-based domain adaptation techniques were used (R2), we employed domain randomization following [1]. We thank R2 and Meta-R for noting that this point, which was made only in the introduction, “Following recent work that enables sim-to-real transfer in the X-ray domain…” merits further clarification in the method section.
- Future work may discuss further exploration of model architectures, uncluding TCN and LSTM (R3).
[1] Gao, Cong, et al. “Synthetic data accelerates the development of generalizable learning-based algorithms for X-ray image analysis.” Nat. Mach. Intell., vol. 5, no. 3, Mar. 2023, pp. 294-308, doi:10.1038/s42256-023-00629-1.