Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Zeyi Hou, Ruixin Yan, Qizheng Wang, Ning Lang, Xiuzhuang Zhou

Abstract

Automating the analysis of chest radiographs based on deep learning algorithms has the potential to improve various steps of the radiology workflow. Such algorithms require large, labeled and domain-specific datasets, which are difficult to obtain due to privacy concerns and laborious annotations. Recent advances in generating X-rays from radiology reports provide a possible remedy for this problem. However, due to the complexity of medical images, existing methods synthesize low-fidelity X-rays and cannot guarantee image diversity. In this paper, we propose a diversity-preserving report-to-X-ray generation method with one-stage architecture, named DivXGAN. Specifically, we design a domain-specific hierarchical text encoder to extract medical concepts inherent in reports. This information is incorporated into a one-stage generator, along with the latent vectors, to generate diverse yet relevant X-ray images. Extensive experiments on two widely used datasets, namely Open-i and MIMIC-CXR, demonstrate the high fidelity and diversity of our synthesized chest radiographs. Furthermore, we demonstrate the efficacy of the generated X-rays in facilitating supervised downstream applications via a multi-label classification task.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43904-9_47

SharedIt: https://rdcu.be/dnwHs

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #2

  • Please describe the contribution of the paper

    The article proposes a method called DivXGAN, which uses deep learning algorithms to generate high-fidelity and diverse chest radiographs from radiology reports. The method incorporates a domain-specific hierarchical text encoder to extract medical concepts from reports and incorporates them into a one-stage generator along with latent noise vectors to generate diverse yet relevant X-ray images. The proposed method is compared with state-of-the-art alternatives and shows high fidelity and diversity in synthesized chest radiographs. The generated X-rays are shown to facilitate supervised downstream applications via a multi-label classification task.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The text is well written and easy to follow.

    • Introduction of a domain-specific hierarchical text encoder to extract medical concepts inherent in reports and incorporate them into a one-stage generator, which generates X-rays with high fidelity and diversity.

    • Extensive experiments on two widely used datasets, Open-i and MIMIC-CXR, demonstrate the high fidelity and diversity of the synthesized chest radiographs.

    • Demonstration of the efficacy of the generated X-rays in facilitating supervised downstream applications via a multi-label classification task.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The contributions of the paper are not clearly stated in the Introduction.

    • The paper does not provide a detailed analysis of the limitations of the proposed method, such as the potential biases introduced by the training data and the generalization ability of the model to unseen data.

    • The paper does not provide a thorough discussion on the ethical implications of generating medical images from radiology reports and the potential risks associated with the misuse of such technology.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors used public datasets and the paper is easy to follow, so the approach should be reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The paper proposes a diversity-preserving method called DivXGAN, which generates high-fidelity and diverse chest radiographs from radiology reports using a one-stage generator and a domain-specific hierarchical text encoder. The method is evaluated on two datasets and shown to facilitate supervised downstream applications via a multi-label classification task. The authors provide ablation study and comparison with other approaches. The authors used public datasets and the paper is easy to follow, so the approach should be reproducible. The contributions of the paper are not clearly stated in the Introduction. The paper does not provide a detailed analysis of the limitations of the proposed method, such as the potential biases introduced by the training data and the generalization ability of the model to unseen data. The paper does not provide a thorough discussion on the ethical implications of generating medical images from radiology reports and the potential risks associated with the misuse of such technology.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Technical novelty, reproducibility, and results achieved.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #5

  • Please describe the contribution of the paper

    This manuscript presents a novel chest X-ray synthesizer using radiology reports as input. Main contributions are: 1) to use a one-stage architecture instead of multi-stage approaches of previous work, 2) to integrate a noise term in the input to promote diversity of the synthesized image, 3) design of the text encoder. The method is compared to three other methods, on two public datasets, with promising results. An ablation study shows the added value of contributions 1 and 3. The synthesized data is used as input for training a classifier, showing that the accuracy is similar when training on synthesized compared to real data (and slightly better when combined).

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    S1) The method seems designed well. S2) The manuscript is written in a clear way. Innovations are motivated quite well. S3) The results are promising.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    W1) Despite the ablation tests performed, there are quite some design choices that could further be evaluated.

    W2) In the experiments with classification, I miss confidence intervals, to judge whether the differences in performance are significant.

    W3) The Real+Synth experiment raises several questions: a) since amount of data was twice as large, was the training time also twice as large? what if you would have trained on Real data only, but with twice as many epochs? b) were any augmentation strategies used in the other experiments? c) how were the 5k synthetic images generated? using the 5k real reports as input?

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Public data is used. The method is described clearly, but not enough details are given to completely reproduce the work. Given the page limit, it is not bad though. Releasing the code would be helpful.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    D1) “This approach will substantially improve traditional supervised downstream tasks” -> “This approach may substantially improve traditional supervised downstream tasks”

    D2) Section 3.2: “This is because”, “The reason lies in that” -> speculation, and repetition of the method

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Strenghts S1 and S2: the main methodology seems appropriate. Experiments are informative and suggest that the approach is effective.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposes a new report to X-ray generation method. It is an improvement over a previous method XrayGAN. The improvements are in three aspects: diverse sampling from a single report, one-stage architecture and domain-specific text encoder. The proposed method is evaluated on the OpenI and MIMIC-CXR dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed method is novel. The experiments are extensive and verified the effectiveness of the proposed method. The paper is clearly written.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    It is recommended to compare with more recent text-to-image generation, for example GANs [1][2], and Diffusion Models [3][4].

    [1] Zhu, Minfeng, et al. “Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019. [2] Tao, Ming, et al. “Df-gan: A simple and effective baseline for text-to-image synthesis.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. [3] Liu, Xihui, et al. “More control for free! image synthesis with semantic diffusion guidance.” Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2023. [4] Ho, Jonathan, and Tim Salimans. “Classifier-free diffusion guidance.” arXiv preprint arXiv:2207.12598 (2022).

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper can be reproduced if the code is released.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The paper is overall well written and shows improvement over a previous work. It is recommended to compare with more state-of-the-art methods.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    There is not much previous work in this area. The proposed method is overall good and the components make sense. This is a GAN-based generative model and is compared with previous GAN-based works, so it is acceptable that not having more advanced generative models like diffusion models for comparison.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This is a nicely written paper with novelty and good results and analysis. However, reviewers raise potential concerns which need to be addressed in the revised manuscript.




Author Feedback

We thank the Area Chair and all reviewers for their time, efforts, and insightful comments that are helpful to improve the quality of our manuscript. We have clarified most concerns raised by the reviewers, and the detailed responses are summarized as follows.

Reviewer #2: The contributions of the paper are not clearly stated in the Introduction. The paper should provide a detailed analysis of the limitations, as well as a thorough discussion of the ethical implications and potential risks of generating medical images from radiology reports.

Response: Thank you for your thoughtful comments. We will revise the Introduction section in the revision to clearly state our contribution. Indeed, the potential biases introduced by the training data, the generalization ability of the generative model, and the ethical implications are all worthy of further discussion in this area. Due to the page limit, we would like to investigate these aspects for our method and include an appendix as a supplement.

Reviewer #3: It is recommended to compare with more recent text-to-image generation, for example GANs [1][2], and Diffusion Models [3][4]. This is a GAN-based generative model and is compared with previous GAN-based works, so it is acceptable that not having more advanced generative models like diffusion models for comparison.

Response: We appreciate the reviewer’s objective comments. As mentioned, there is limited existing work in this area. Following the first work (XrayGAN) on generating chest radiographs from reports, we have compared several representative GAN-based text-to-image generative algorithms. However, due to space constraints in the paper, we plan to conduct a more extensive evaluation of algorithms in future research.

Reviewer #5: In the experiments with classification, I miss confidence intervals, to judge whether the differences in performance are significant. The Real+Synth experiment raises several questions: a) since amount of data was twice as large, was the training time also twice as large? what if you would have trained on Real data only, but with twice as many epochs? b) were any augmentation strategies used in the other experiments? c) how were the 5k synthetic images generated? using the 5k real reports as input? The method is described clearly, but not enough details are given to completely reproduce the work. Releasing the code would be helpful.

Response: We appreciate the detailed and constructive comments from the reviewer. In line with your suggestion, we plan to explore the use of confidence intervals in future work to provide a clearer assessment of the significance of our method’s performance. Regarding the Real+Synth experiment, doubling the amount of data also doubles the training time, as you correctly pointed out. We acknowledge that there are several design choices and downstream tasks that could be further evaluated. Due to space limitations, we only provided an example of incorporating generated X-ray images into the training data for multi-label classification. This example demonstrates that the generated images contain medical concepts that can be utilized to train downstream tasks. We randomly sampled 5k real images and their corresponding reports from the test set. These 5k real reports were input into our generative model, generating one image per report using a single latent vector, resulting in 5k generated images. When training the multi-label classifier, both real and generated images underwent the same general data augmentation techniques, such as rotation and scaling. We intend to include these details in the revision and make the code available later to facilitate a better understanding of our approach.



back to top