Authors

Lisa Kausch, Sarina Thomas, Holger Kunze, Jan Siad El Barbari, Klaus H. Maier-Hein

Abstract

Surgical treatment of complicated knee fractures is guided by real-time imaging using a mobile C-arm. Immediate and continuous control is achieved via 2D anatomy-specific standard views that correspond to a specific C-arm pose relative to the patient positioning, which is currently determined manually, following a trial-and-error approach at the cost of time and radiation dose. The characteristics of the standard views of the knee suggests that the shape information of individual bones could guide an automatic positioning procedure, reducing time and the amount of unnecessary radiation during C-arm positioning. To fully automate the C-arm positioning task during knee surgeries, we propose a complete framework that enables (1) automatic laterality and standard view classification and (2) automatic shape-based pose regression toward the desired standard view based on a single initial X-ray. A suitable shape representation is proposed to incorporate semantic information into the pose regression pipeline. The pipeline is designed to handle two distinct standard views with one architecture. Experiments were conducted to assess the performance of the proposed system on 3528 synthetic and 1386 real X-rays for the a.-p. and lateral standard. The view/laterality classificator resulted in an accuracy of 100\%/98\% on the simulated and 99\%/98\% on the real X-rays. The pose regression performance was $d\theta_{a.-p}=5.8\pm3.3\degree,\,d\theta_{lateral}=3.7\pm2.0\degree$ on the simulated data and $d\theta_{a.-p}=7.4\pm5.0\degree,\,d\theta_{lateral}=8.4\pm5.4\degree$ on the real data outperforming intensity-based pose regression.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43990-2_45

SharedIt: https://rdcu.be/dnwL0

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

Methods are proposed for predicting the pose offset to its optimal pose of a C-arm for two standard views (a.p. or lateral view) of the knee from a single input image. The methods are trained on simulated data from CTs and evaluated on the simulated data as well as on real (cadaver) C-arm images, thus demonstrating the generalization to real data. The importance of several design choices are demonstrated in an ablation study.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The motivation convincingly illustrates the clinical significance of the problem.

A novel two-step approach for standard plane pose offset prediction is presented. The pose prediction itself is based on PoseNet, but the input used for PoseNet (obtained by the proposed pipeline) is novel.

The evaluation demonstrates that the proposed approach, when trained on simulated data from CT, generalizes to real C-arm data.

The importance of the proposed steps of the pipeline are compared in an ablation study.

The proposed approach includes an interesting data augmentation step with some non-standard augmentations like “transparent edges”, “random region dropout” and “border overlays”.

The seven minutes long supplemental video demonstrates the potential of the proposed approach on all four simulated and all six real cases.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

I found the description of the proposed methods rather hard to follow (see item 9 on comments to the authors for details).

The purpose of some sub-tasks are not explained in such a way that their usefulness can be easily understood by the reader (e.g., purpose of view and laterality classification).

The training of the networks are not described.

Parameters of the applied methods and their values are not reported.

A statistical significance analysis has not been performed.

All data seem to be from non-fractured knees. If so, it is unclear, how the method will perform on the actual use case (complicated knee fractures).

It is not clearly defined why the method is called “shape-based”.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The reproducibility is limited because the description of the proposed methods is hard to follow and not complete (e.g., parameters are missing, training not described).
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

I found the scientific ideas presented in your paper interesting and believe they have potential. In addition to my comments under item 6 on the main weaknesses, I would like to ask you to consider the following additional issues.

Abstract: When reading the abstract, it was not immediately clear to me why the “automatic laterality and standard view classification” could be useful.

Related to this: “The pipeline is designed to handle two distinct standard views simultaneously.” The word “simultaneously” suggests (at least for me) that the two distinct standard views are handled at the same time, thus that BOTH have to be provided as input to the pipeline (which is not the case, as far as I understand, because the method work on a “single initial X-ray”).

Fig.1, block “Seg-based”: “0|concat” – what does the “0” mean? Fig 1 right side: “initial x-ray”. The shown two input images show grey triangular areas at the four corners of the images. These come from the artificial rotation of the original images and would not be present in a real input image. If such images including those grey triangular areas were used for training, the method could have learned to predict the rotation from shape of the grey triangular areas. Fortunately, from the supplemental video it can be assumed that this was not the case because the training images were created from DRRs.

Page 3: “Since intraoperative X-rays with reference pose annotations do not exist” – This sounds like it is impossible to create such reference annotations – is this the case? Why? Or do you mean “… were not available”? Page 3: “(2) Laterality alignment” – What does this mean and how is it performed?

Page 4: “[…] not distinguishable in the shape-based representation” – What is meant by “shape-based representation”?

Page 4: “however, this is relevant for optimal lateral view recognition.” – What is “view recognition” (which “views” need to be recognized?) and why is it important? Page 4: “Annotating the condyles as line features” – Is it immediately clear that this would make sense? Why would line features be a good representation of condyles?

Subsection titles of 2.2 and 2.3: Please consider removing the colons.

Page 5: “The architecture is based on a 2D U-Net [12] with two view-specific segmentation heads” – This reads like it is a single 2D U-net that predicts the segmentations of both views (“two view-specific segmentation heads”, i.e., the AP AND lateral segmentation) – is this correct? Thus, from a SINGLE input view (because it is “a 2D U-net”, thus either AP OR lateral input), segmentations of BOTH views are predicted?

Page 5: “The extracted shape features” What is meant by “shape features”? Simply the 4-/5-channel segmentation results or some feature map of the U-net?

Section 2.3 “Validation data”: Please consider renaming this to “real X-ray test data” or something similar, because there is also “validation” data of the simulated data (5 CTs, see last sentence of section 2.1).

Supplemental video: I assume “corr. proj.” refers to the in-plane correction step – if so, I suggest adding “in-plane” for clarity. The video frames introducing a case do not contain whether the case is a simulated or a real case (3:42 “case 4 AP” for a simulated case and 6:16 “case 4 AP” for a real case) which would be helpful to know when browsing through the video. It would be even better to have this information on each of the result frames.

Title: I suggest adding a \newline after “for” such that “automatic” appears in the same line as “standard views”.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

A clinically relevant topic is addressed, novel ideas are presented and several variants are evaluated in an ablation study w.r.t. three clearly stated research questions on simulated and real (cadaver) image data: Thus, the methodological content is strong, but, unfortunately, I found the paper (in its current form) rather difficult to follow, which also affects the reproducibility.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

The presented approach aims at automating the positioning of a C-arm for image acquisition during knee surgery. This removes the need to find the best C-arm configuration by “trial and error” and thus reduces both radiation exposure and surgical time. The main contribution is the proposed 2-step methodology that takes as input a single X-ray image and outputs an estimated optimal C-arm pose. The first step is an intensity-based view classification and in-plane rotation regression. This is followed by a second step: a segmentation-based pose regression that outputs the necessary C-arm pose parameters update. The second major contribution is the evaluation of each step on both synthetic and pre-clinical data (images from a cadaveric experiment). This evaluation shows that the method outperforms state-of-art shape based and intensity-based pose regressions approaches. Evaluating on pre-clinical data provides useful insights on how well the method would translate to clinical application.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper is clear and well-written. The design choices are sound, and the methodology is well justified. The evaluation protocol is well presented, and the results provide interesting insights on the possible translation of the approach to clinical practice. Besides the contribution already listed above: A major strength is the fact that the approach is trained only in synthetic data with ground-truth annotations generated automatically. An additional data augmentation steps allow to generate a complete training dataset that generalizes well to real X-ray images (as shown in the evaluation on cadaver images). Therefore, no patient-specific training data or additional hardware (RGBD cameras, tracking devices as seen in other similar works) is required. This supports the potential translation to a clinical application, since the method’s interference with the clinical workflow would be minimal. The evaluation section puts forward the separate contribution of each step of the methodology to the final performance of the full pipeline. Indeed, ablation studies show for instance that the semantic information is useful for the pose regression step.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The approach seems specifically tailored to the Knee anatomy. The extension to other joint surgeries could be briefly mentioned as a possible extension of this work. Would substantial changes to the method be required?

Also, radiation dose is mentioned as the main motivation of this system in the introduction. Yet, the impact of the approach on the dose is not mentioned later in the paper nor evaluated. It would have been nice to report the amount of radiation dose that would be reduced with this approach. The same goes for the surgical time reduction.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Good. Enough details about the methodology, implementation, hyperparameters, split, synthetic data generation and data augmentation are provided.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
- I would recommend commenting on the failure cases. A few “bad” images are shown in Figure 6, yet no comment whatsoever is done in the text. Any ideas on why these bad performances appear and what clinical impact they would have?
- It could be interesting to comment on the accuracy of the pose regression (of around 8 degrees for the real data evaluation). Is this accuracy good enough for a clinical application? I believe that a few extra degrees would yield a significantly different X-ray image than the optimal one. Any ideas on how to reduce this error?
- As mentioned before. What is the actual impact on dose reduction with your approach? Is it significant enough to justify the effort of having an automated C-arm positioning approach?
- It is not clear if Figure 6 shows only results on synthetic data or if the images come from pre-clinical data.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

7
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I recommend accepting this paper. The methodological contributions are, to my opinion, of interest for the MICCAI community.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #5

Please describe the contribution of the paper

This paper proposed a novel pipeline to estimate knee pose in a C-arm setup for automatically defining standard views of the knee. The pipeline follows a hierarchical two-step strategy by simultaneously clasifying laterality and predicting in-plane rotation in the first step. Next, it follows previous work of predicting a transform to correct standard view of knee for C-arm positioning.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The paper proposed a pipeline to estimate knee pose in a C-arm setup for automatically defining standard views of the knee;
2. The proposed 2-step method is an extension to a previous 1-step approach and proved the value (robustness and accuracy improvement) to add an additional step;
3. The pipeline was validated on both simulated images (DRRs) and real X-ray images, demonstrating its generalization from DRR to X-ray.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

A hierarchical pipline for pose estimation, well done while still look forward some end-to-end framework with comparable performance.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The proposed pipeline is clear to follow and the paper clearly answered some questions beforehand that reveiwer would like to know about the pipeline design.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

A nice paper including technical innovation and comprehensive experiments, and it tried solving an engineering issue with expected performance achieved.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
1. Clear description about the proposed method and compared it without different structures, similar to an ablation study;
2. Comprehensive study including simualted images and real X-rays, demonstrating ithe generalization of the pipeline transferred from DDR to X-ray;
3. Prereivew the paper and give the explanations to some questions review would like to know.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
The paper proposes a novel, two-step, shape-based pose estimation method for standard views of the knee using a C-arm setup, aiming to reduce radiation exposure and surgical time.

The reviewers are all positive about this paper. After reading the comments raised by the reviewers, I recommend conditional acceptance of the work. The authors are recommended to address the reviewers’ main points summarized below in the paper’s final version.
- Improve the clarity of method descriptions, sub-tasks purpose, and network training processes.
- Consider performing statistical significance analysis and providing parameters of applied methods.
- Discuss the actual impact on dose reduction and surgical time with the approach.

Author Feedback

We thank the reviewers for the acknowledgment of our work and for the constructive feedback.

Following the reviewers comments, we revised our method description in Sec. 2.1 and 2.2. Specifically, we explain the connections between the different sub-tasks and their purpose in more detail. Based on the laterality classification, left knees are horizontally flipped to mirror the right anatomy, simplify the pose estimation task, and prohibit ambiguities during pose estimation. Considering the results of the view classification, the extracted segmentation map of the corresponding segmentation head is used as input for the pose regression network that outputs the necessary C-arm pose update. To ensure equal number of input channels to the PoseNet for both standard views, a zero channel is appended to the a.-p. multi-label segmentation head output. In Sec. 3, we extended the description of the training processes. The models were implemented using PyTorch 1.6.0, trained with an 11 GB GeForce RTX 2080 Ti, and optimized with the Adam optimizer with a base learning rate of $\eta=10^{-4}$ and batchsize 8, pre-trained independently, and jointly fine-tuned until convergence.

In the experiments section (Sec. 3), we clarified that we tested for statistical significance using the paired t-test.

Further, we extended our discussion to reflect on the actual impact on dose reduction and surgical time with the proposed approach. As previously shown for the spine anatomy [doi: 10.1101/2022.02.12.22270884], manual C-arm positioning resulted on average on a positioning time of 75.9 s and 7.1 X-rays with significantly more X-ray shots and dose for less experienced surgeons. Mean inter-rater central beam variation was 7.6°. The proposed automatic C-arm positioning reduces number of necessary acquisitions to 2 (1 initial, 1 final), thereby reducing the dose and time. We also extended the discussion of failure cases. Failure cases can be related to inaccurate intermediate segmentations which may result from patella baja not represented in the training dataset. The segmentation features can serve as a sanity check and indicate the reliability of the pose regression result. Further experiments with a larger training set covering more anatomical variation, e.g., patella baja, different flexion angles, and fractures can potentially address observed failure cases.

Following the suggestions of the reviewers, we changed some notations to ease comprehension.

We hope we could respond to the open questions and we are looking forward to meeting you at MICCAI.

back to top

Shape-based pose estimation for automatic standard views of the knee