Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yihao Zhou, Chonglin Wu, Xinyi Wang, Yongping Zheng

Abstract

Abstract. X-ray radiography with measurement of the Cobb angle is the gold standard for scoliosis diagnosis. However, cumulative exposure to ionizing radiation risks the health of patients. As a radiation-free alternative, imaging of scoliosis using 3D ultrasound scanning has recently been developed for the assessment of spinal deformity. Although these coronal ultrasound images of the spine can provide angle measurement comparable to X-rays, not all spinal bone features are visible. Diffu- sion probabilistic models (DPMs) have recently emerged as high-fidelity image generation models in medical imaging. To enhance the visualization of bony structures in coronal ultrasound images, we proposed UX-Diffusion, the first diffusion-based model for translating ultrasound coronal images to X-ray-like images of the human spine. To mitigate the underestimation in angle measurement, we first explored using ultrasound curve angle (UCA) to approximate the distribution of X-ray under Cobb angle condition in the reverse process. We then presented an angle embedding transformer module, establishing the angular variability conditions in the sampling stage. The quantitative results on the ultrasound and X-ray pair dataset achieved the state-of-the-art performance of high-quality X-ray generation and showed superior results in comparison with other reported methods. This study demonstrated that the proposed UX-diffusion method has the potential to convert the coronal ultrasound image of spine into the X-ray image for better visualization.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43996-4_1

SharedIt: https://rdcu.be/dnwOB

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes improvement on methods that generate an X-ray style image for a volume projection ultrasound image of the spine.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Generating X-ray style images from coronal volume projection ultrasound images is a relatively new concept. It has potential as an ionizing radiation free method that may substitute X-ray in some clinical applications.
    • This paper compares stat-of-the-art generative methods on a relatively large and unique clinical dataset.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • I don’t see how the evaluation metrics (SSIM and PSNR) correlate with clinical value of the generated images.
    • The results show poor accuracy that does not immediately enable this method to substitute X-ray imaging.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Code and data are not provided. However, authors don’t propose complex new methods that would be too challenging to reproduce. The problem is that readers will not have volume projection spine images, so even if they reproduce the methods, it is difficult to verify the results. I think a test dataset should be released in the public in this case.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    I suggest that you work more on evaluation and benchmarking of your results. Measurements using this method is clearly not suitable for replacing Cobb angle measurement, because a significan portion of measurements on the generated X-ray images are over 5 degrees different from the measurements on the original X-ray images. Correlation that allows this high error values will not be directly acceptable in clinical practice. Maybe tracked ultrasound images should not be projected on the coronal plane early in the data processing pipeline. Consider segmentation of the original ultrasound images, or even a 3D reconstructed volume to generate a CT-style volume. If you do geometric projection as the last step of the data processing pipeline, then less error would propagate to the results. Study cohort description would be important for such studies with clinical evaluation part. Excluding patients above a BMI of 25 is not the best choice, as there is a significant rate of adolescent population above this BMI. It would be better to include all patients and analyze the reliability of results as a function of BMI.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is interesting and clearly written. But the clinical application of results is questionable with this much error. The paper only provides incremental algorithmic value on previous publications. There is no siginifican contribution to open-source or open data either.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors proposed a novel diffusion model to generate an X-ray image of the human spine from its corresponding ultrasound coronal image, which is claimed to have no prior work on this. Specifically, this work uses a probabilistic denoising diffusion model and a novel angle embedding attention module. The attention module is incorporated to acquire the Cobb angle from the input image. The clinical significance of this work is that it allows radiation-free assessment instead of requiring the patient to do X-ray scanning.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Novel Application. The application of denoising diffusion probabilistic model to generate X-ray image from ultrasound image of human spine is novel.
    • Technical Contribution. Authors formulated a novel conditional diffusion model that learns the data distribution of a set of Cobb angle. Moreover, it incorporates an attention module that learns to establish relationship between Cobb angle in X-ray image and curve angle in the ultrasound image. Lastly, it proposes a new loss function for this problem.
    • Clinical Feasibility. This works will allow radiation-free scoliosis assessment.
    • Clear Experimental Setup. Authors provide necessary information on the acquired dataset, along with generally detailed implementation details. It also compares the proposed model with other GAN-based and diffusion-based methods.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Lack of Quantitative Measures. The performance comparisons is not convincing or clear. For example, “spine curvature orientation” is just measured qualitatively.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper contains good amount of implementation details. Moreover, the authors will provide code, which may improve the reproducibility of the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    MAJOR

    1. “We manually aligned and cropped the paired images to the same reference space.” How was this step done? How good is the alignment? How does this pre-processing affect the performance?
    2. Please consider providing equations for SSIM and PNSR.
    3. Based on Table 1, it seems that there isn’t a significance performance gain between the proposed model and previous works. Please elaborate.
    4. It would be great to provide ablation study on the effect of incorporating the attention module.
    5. In 3.4, who does the measurement of Cobb angle? Experts/Surgeons? Or it was computationally derived from the X-ray image?
    6. If the objective is to measure Cobb angle, would it be possible to predict Cobb angle from Ultrasound images?
    7. Is the difference between the Cobb angles of the synthetic and real X-ray images clinically acceptable?
    8. How many experts/radiologists did the annotation? How is their consistency?

    MINOR

    1. Need better explanation/clarification
      • In 3.1, the image is resized them into 256x512, is it to maintain aspect ratio?
      • In 3.3, “Our proposed model has a higher quality in the generation of vertebral contour edges compared to the baselines”. What does higher quality means?
      • In 3.3, it is unclear how significance 2.5-6.7 points on SSIM is.
    2. Potential typos.
      • In 3.3, “ground-true images”.
    3. Suggestions.
      • For Figure 1, it would be better to highlight that the ultrasound images do not show the curve/angle of the spine as well as X-ray image.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Novel application of a sufficiently novel diffusion models, though the quantiative measurements could be more comprehensive.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    Authors have provided clearer justification on the signifance of the study (except the claim about learning curve), the dataset details and the choice of perfomance metric. I’m aligned with the authors’ claimed on the significance of the study and quality of annotated data. However, I’m not fully convinced on the significance of model’s performance: (1) PNSR and SSIM of the proposed method is very close to that MedSegDiff. (2) The Cobb angle deviation in the previous work is not mentioned. (3) The acceptable error of Cobb angle in the clinical setting is not mentioned.



Review #4

  • Please describe the contribution of the paper

    The authors propose a customization of DDPM to generate xray looking images of the spine from stitched coronal images produced from 3D ultrasound volumes. The paper is well written and compares the proposed method to state of the art GAN and diffusion models. The justification is built upon the need for clinicians to see a more familiar spine image than ultrasound (they are used to xray) and to reduce radiation doses on patients.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is clear and well written
    • The angle processing module is novel and very intereesting addition to the diffusison model.
    • Qualitative results as shown in Fig 4 are impressive
    • Quantitative results show that the method outperforms others
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Justification is a bit weak: 1) The statement that clinicians want images more similar to xray because they are used to that needs to be supported with some evidence. In my experience, clinicians are more concerned about twether the new imaging would fit well in their workflow and whether they can make the measurements reliably. If that is the case they would normally be keen to adopt. Also, in terms of reducing the radiation dose, while I agree that less does is always better: how often are these patients radiated, and is it really a clinical concern? Again some evidence on this would be useful.

    • The quantitative results need a bit of context, given that the chosen metrics (SSIM and PSNR) are not linked directly with the justification above. I think the least needed is to add the values for the authoentic x ray. Also discussing how the competing models were optimized for fairness (particularly the Pix2pix results look much worse than I would normally expect)

    • The qualitative results are very important, since justification for the work (other than dose) is of qualitative nature. I would suggest that the paper needs to include some questionnaire asking clinicians what they think about the proposed method to shed some light on whether this would indeed help adoption.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The method is described clearly although I am not entirely sure the paper is reproducible fully. Some details of the network are missing (initialization, wether dimensioning of layers has changed to fit the current image size, any preprocessing or data normalization, etc). Also the dataset is proprietary so experiments are not reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • The main purpose is to improve visualization which is a qualitative property, hence the experiments should include a qualitative evaluation of the enhanced images.

    • A justification for the proposed work is that “clinicians hesitate to adopt this image modality since spinal images formed by VPI method are new to users, and the bone features look different from those in X-ray images”. It would help see some evidence on this, for example a survey on clinicians, or better some published literature that reveals that this is indeed an issue.

    • The method takes UCA and Cobb angle as input so it means there is no quantitative purpose to the method but only qualitative (since quantification must be done in advance). I wonder if the quantification might changce, it.e. clinicians might annotate generated images differently since they can see them better?

    • A contribution of the authors is that the method correlates well with authentic Cobb angle. However the Cobb angle was used as input, so correlation should be “perfect”. Can authors please comment on how comparing the Cobb angle which was an input is relevant? Are they trying to measure to what extent the image generation process deviates from the input prescribed angle?

    • Fig 4 shows improsive results but it seems like the ultrasound and ground truth do not overlay well - it might be a visual effect but maybe some overlay lines would help

    • In table 1, for reference, can you add the SSIM and PSNR of the authentic XRay images? Also, there is training details for the proposed model, but not for the competing methods; can authors reassure that same care in hyperparameter tuning and architecture optimization was taken to train the competing methods?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I am not convinced that this research is relevant for the intended application, because 1) there are competing methods (included) that seem to work nearly as well, and 2) I am not convinced that clinicians’ main reason for not adopting US is that it does not look like xray. I am happy to be rebutted on this though and otherwise the technique seems very interesting and novel.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    4

  • [Post rebuttal] Please justify your decision

    I had three concerns, which in general remain unanswered.

    1. Authors have not addressed my main concerns on motivation. They refered to a paper which does not support the motivational statemtn.
    2. About my concern on having reference values for the proposed quality metrics, authors rebut by saying that better quality (captured by these metrics) would yield better measurements, but I still would like to know if this is really true, and what are these metrics for the technique of reference (xray). I don’t think this is an extra experiment or somerthing unreasonable to ask, but rather applying the metric to the techinque of reference.
    3. About the qualitative study, authors leave it for future work.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Overall, all three reviewers agree that the paper presents a novel application, with comparison to state of the art approaches demonstrating top performing results with impressive synthetic images produced by the model. However, several comments pointed out to major weaknesses, including the lack of justification, need for improved context and lack on the quantitative results. These points would need to be addressed in the rebuttal phase.




Author Feedback

1) Significance of study We agree with the reviewers that ultrasound (US) cannot replace X-ray for scoliosis assessment because US does not penetrate the bone well. We would like to clarify that the motivation for developing US method for scoliosis assessment is to reduce the use of X-ray, such as during follow-up monitoring. X-ray and US can complement with each other, with X-ray providing clearer view of spine and US for more frequent assessment anywhere. Some scoliosis patients may progress very fast, but X-ray usually requires an interval of 6-12 months[r1]. With US, these patients can be monitored frequently to understand the progression and treatment outcome timely. The validity and reliability of ultrasound measurement for assessing scoliosis have been shown with a high correlation with X-ray Cobb angle and high intra- and inter-operator repeatability[r2-r3]. However, it was reported that the learning curve for physicians to measure angle in US images is long because the anatomy in US image of spine is not intuitive[r4]. That is the motivation of this study to convert US images of spine into X-ray images so that clinicians can measure angles like X-ray images. This conversion mainly tackle visualization but with the deformity information consists with X-ray.

2) Quantitative and qualitative results From the clinical perspective, the purpose of calculating SSIM and PSNR between the generated and original images is to determine whether the generated images agree with the real X-ray images. A higher SSIM and PSNR indicates a better visualization of the generated spinal structures to facilitate the operator to better locate the spinal features for Cobb angle measurement and interpretation. It can be observed from the visualization comparison that for images with low SSIM (Pix2pix), the contours of the vertebrae are blurred, making it difficult for Cobb angle measurement. For comparison with the real Cobb angle, we follow [r5-r6] to study the correlation and agreement between the Cobb angles obtained using the real X-ray and generated X-ray. R2 = 0.8531 showed very good linear correlations between the generated and orignal image. We included 30 patients with a total of 50 angles (thoracic and lumbar), and the proportion of angle with absolute error less than 5 degrees is 44/50=0.88. There are 12% of angles with error larger than 5 degrees. There are spaces to improve. One potential reason of error may be the inconsistency of alignment in data preprocessing, which needs to be improved in future studies. We agree with the reviewers that in future studies we will include all subjects with different BMI, include the reliability of results, and add the questionnaire for clinicians to enhance the qualitative analysis of the proposed method.

3) Competing methods The papers of all compared models have described specific structures of the models and the training pipelines. We reproduce the experiments strictly according to the settings in those papers.

4) Dataset details The coronal X-ray measurement was carried out by an expert with 15 years of experience reading scoliosis patients’ radiographs, while the ultrasound images were assessed by two raters with at least five years of experience evaluating scoliosis using ultrasound. One of the raters had been investigating the UCA measurement for 2 years and the other rater had been learning this new measurement method for 3 months. All raters performed the measurements independently, without discussing the selection of the end vertebrae levels. In addition, all the raters were blinded to the patients’ details and each other’s results. [r1]Australian Journal of General Practice 2020, 49:832-837. [r2]The Spine Journal 2018,18:979–985 [r3]Scoliosis and spinal disorders 2016, 11:1-15. [r4]Journal of orthopaedic translation 2021, 29:51-59 [r5]Ultrasonics 2022, 126:106819. [r6]EClinicalMedicine 2022, 43:101252.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors rebuttal has greatly helped to clarify some points with regards to the justification of the method and significance of the study with regards to clinical practice. However, there is still some uncertainty with regards the the model’s performance on image quality, overall model performance with qualitative results and comparison to previous work in the literature. It also is unclear what level is clinically acceptable for spine evaluations. Still, reviews tend to be more on the positive side and believe the remaining aspects could be addressed in the final version or in follow-up study. I would lean towards acceptance.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Despite the interest in the paper, there are still concerns that the reviewers felt were not adequately addressed by the authors, specifically in terms of clinical motivation and quality metrics. Given this I don’t believe the paper in its current form is acceptable for MICCAI.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    While the idea of synthesizing X-ray images from US is interesting, this work lacks sufficient motivation (“follow-up monitoring” needs other experiments to support) and technical validity to be accepted.



back to top