Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Leihao Wei, Anil Yadav, William Hsu

Abstract

Mitigating the effects of image appearance due to variations in computed tomography (CT) acquisition and reconstruction parameters is a challenging inverse problem. We present CTFlow, a normalizing flows-based method for harmonizing CT scans acquired and reconstructed using different doses and kernels to a target scan. Unlike existing state-of-the-art image harmonization approaches that only generate a single output, flow-based methods learn the explicit conditional density and output the entire spectrum of plausible reconstruction, reflecting the underlying uncertainty of the problem. We demonstrate how normalizing flows reduces variability in image quality and the performance of a machine learning algorithm for lung nodule detection. We evaluate the performance of CTFlow by 1) comparing it with other techniques on a denoising task using the AAPM-Mayo Clinical Low-Dose CT Grand Challenge dataset, and 2) demonstrating consistency in nodule detection performance across 186 real-world low-dose CT chest scans acquired at our institution. CTFlow performs better in the denoising task for both peak signal-to-noise ratio and perceptual quality metrics. Moreover, CTFlow produces more consistent predictions across all dose and kernel conditions than generative adversarial network (GAN)-based image harmonization on a lung nodule detection task.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43990-2_39

SharedIt: https://rdcu.be/dnwLU

Link to the code repository

https://github.com/hsu-lab/ctflow

Link to the dataset(s)

https://doi.org/10.7937/9NPB-2637


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper performs image transformation from e.g. low-dose CT to standard-dose CT. It encodes the latent space of the target and then uses the latent space for inference. An invertible encoder is learned, and then inverted to produce a decoder. A claimed novelty lies in the ability to produce multiple outputs and their probabilities.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Overall, not a bad paper. The problem is important, and the algorithm design seems fine. Looking through the experimental results, I couldn’t find any demonstration of the multiple outputs and their probabilities. I was a bit confused by the nodule comparison experiment; were the SNGAN numbers taken from literature or was the algorithm reimplemented? In either case there needs to be clarity about the methods. Also, if I understand correctly, you need to retrain for each parameter change? For example, can a single network transform both a sharp kernel to standard and also a soft kernel to standard?

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    See above

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    See above

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    See above

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Second best in my stack.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper presents CTFlow, a method based on normalizing flows to harmonize CT scans acquired and reconstructed using different doses and kernels to a target scan. The method reduces variability in image quality and improves the performance of a machine learning algorithm for lung nodule detection.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper validated their method in two publicly available dataset.
    2. The study proposes few technical novelty, like using normalizing flow to reduce variability of reconstructed image for different settings.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The main motivation behind using the CTflow is not justified in the evaluation section for LDCT denoising. How the reduction in variability actually clinically benefiting the LDCT denoising task is not explained properly.
    2. Also choice of baseline methods are also questionable, why WGAN with MSE and VGG loss is used for comparison. As we know main contribution of WGAN was introduction of adversarial loss, so it does not make sense to use WGAN generator with MSE and VGG loss. Next, please add any recent SOTA for comparison, as we can see the denoising performance of the proposed method is lower than the BM3D.
    3. For downstream task like nodule detection, it is not mentioned properly whether they have used a single GAN for all the settings or three different GAN as like their own method. As for third condition the result of SNGAN and proposed method is similar, for other kernal the performance diminishes.
    4. Overall the evaluation of the proposed method is not complete to make it good quality paper.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The evaluation of the proposed method must be improved for convincing the readers. For downstream task the author can follow SNGAN and for denoising task add more SOTAs, and explain what is the advantage of using normalizing flow.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    See the weakness. Majorly lack of evaluation is the main reason.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    4

  • [Post rebuttal] Please justify your decision

    I maintain my previous assessment of the paper, leaning towards rejection. The author’s feedback regarding the justification and rationality of using CTflow for LDCT denoising does not appear convincing. While the proposed method may have potential benefits for downstream tasks, the current set of experiments is inadequate to substantiate the claim. To strengthen the paper, I recommend incorporating additional downstream tasks that can validate the rationale behind the proposed method.



Review #4

  • Please describe the contribution of the paper

    The paper presents CTFlow, a novel approach based on normalizing flows for mitigating the effects of computed tomography acquisition and reconstruction parameters on image appearance. Unlike existing image harmonization approaches that generate a single output, CTFlow learns the explicit conditional density and outputs the entire spectrum of plausible reconstructions, reflecting the underlying uncertainty of the problem. The authors demonstrate the effectiveness of CTFlow in reducing variability in image quality and improving the performance of a machine learning algorithm for lung nodule detection. The paper presents a thorough evaluation of CTFlow, comparing it with other techniques on a denoising task and demonstrating consistency in nodule detection performance across a large set of real-world low-dose CT chest scans

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper presents a novel approach, CTFlow, for mitigating the effects of computed tomography acquisition and reconstruction parameters on image appearance.
    2. CTFlow is based on normalizing flows, which allows for generating the entire spectrum of plausible reconstructions, reflecting the underlying uncertainty of the problem.
    3. The authors demonstrate the effectiveness of CTFlow in reducing variability in image quality and improving the performance of a machine learning algorithm for lung nodule detection.
    4. The paper presents a thorough evaluation of CTFlow, comparing it with other techniques on a denoising task using a widely used dataset and demonstrating consistency in nodule detection performance across a large set of real-world low-dose CT chest scans acquired at the authors’ institution.
    5. The use of keywords and clear language in the abstract helps readers quickly understand the paper’s focus.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The authors could address the limitations of their approach and possible failure cases to provide a more balanced evaluation of CTFlow.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors used a publicly available dataset as part of the dataset. Even if their own dataset is not public, they will disclose the pre-trained model and its codes. This will increase the reproducibility even if some details were not mentioned in the text.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The caption for Fig. 2 could be more descriptive to convey the main idea without relying on the text. This would enhance the reader’s understanding and provide a more concise overview of the figure’s content.

    To improve the clarity of the paper, the authors should introduce the RealNVP method by name in the Multiscale Architecture subsection, even though it was previously cited in the introduction section. This will help readers understand the specific method used in the architecture without having to refer back to the introduction section and reference list.

    The authors provided nodule detection performance agreement. It would be better to show the contribution of the proposed data harmonization when they provide the detection performance itself for direct comparison.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents a novel approach, CTFlow, to mitigate the effects of computed tomography acquisition and reconstruction parameters on image appearance and shows two different applications using it.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper describes a method that uses normalizing flows to change the characteristics of CT images and synthesize, e.g., routine-dose CT images from low-dose CT images. Authors show an evaluation on the AAPM grand challenge, and on lung nodule detection in a diverse data set of images that are all harmonized using the proposed model. In both cases, the model obtains good results. This is an interesting paper that addresses a common problem when working with CT data. Authors are invited to write a rebuttal to the reviewer comments.

    Strengths

    • Interesting method, we don’t see a lot of normalizing flows in medical imaging and the authors make a direct comparison with GANs.
    • The evaluation is interesting, with two different tasks.
    • The method achieves good results, showing that the trained model can properly harmonize data.

    Weaknesses

    • Although the model should be able to produce multiple outputs based on a single input image, the authors do not demonstrate this or use this property.

    In the rebuttal, the reviewers should address the comments of the reviewers, and in particular

    • elaborate more on the potential value and applications of the work, and being able to synthesize multiple images. As Rev. 2 mentions, low-dose CT denoising is only one application, which does not necessarily justify synthesizing multiple outputs.
    • Discuss limitations of the approach (Rev. 3).
    • Address the concerns of Rev. 2 regarding the loss function for the GAN model, and the results compared to BM3D on the denoising task.




Author Feedback

Thank you for the constructive feedback on our manuscript.

(Meta Rev) Potential value and applications of the work and the ability to synthesize multiple images: CTFlow predicts outputs for latent variables sampled from an initial (Gaussian) density function; different outputs can be sampled by varying the variance of the Gaussian distribution. We can select the optimal variance value based on improvement in the performance of a downstream task (e.g., lower the variance if it results in higher sensitivity of a nodule detection algorithm). We have also found that by synthesizing all of the outputs by examining the entire range of variance values, we can identify regions of the outputted image that have the highest uncertainty, which often corresponds with interesting diseases of interest and helps with detection tasks.

(Rev 1) Confusion regarding the nodule detection experiment and model training: We trained the CTFlow model to translate scans reconstructed using different kernels and fed them into a nodule detection algorithm. We compared the agreement (measured by the concordance correlation coefficient, CCC) between the F1-score of the nodule detector running on CTFlow outputs and the reference (target) scan. For example, we transformed scans reconstructed using a ‘smooth’ kernel to a ‘medium’ kernel. We then compared the F1 scores of the nodule detector running on the CTFlow output using compared to the detector running on the ‘medium’ kernel (target) scan. The baseline comparison, SNGAN, is a GAN-based approach our group had previously developed. Here, we trained individual CTFlow models for each kernel pair initially, allowing us to debug the models more easily. Our comparison GAN method was also trained individually for each kernel pair. Note that we have since trained a unified CTFlow and conditional GAN models that can be conditioned on different kernels; these did not change the reported results.

(Rev 3) Motivation behind using the CTflow is not justified in the evaluation section for LDCT denoising: We apologize for the confusion in our use of the word ‘variability’. Here, ‘variability’ is used to describe differences in the appearance of CT scans acquired and reconstructed at different dose levels and reconstruction kernels. CTFlow reduces this variability by making scans acquired at one dose level appear more similar to scans acquired at a target dose level. As a result, we reduce the variability in image-derived feature values (e.g., radiomics) and downstream tasks (e.g., F1-score of a nodule detector). The evaluation for LDCT denoising is one specific example where we attempt to show the ability of CTFlow to make reduced (25%) dose scans appear more similar to normal (100%) dose scans.

(Rev 3) Baseline methods are questionable: We used an overall joint loss function, following what was done by Yang et al (doi: 10.1109/TMI.2018.2827462). A Wasserstein GAN (WGAN, termed ‘adversarial loss’ in the paper) was used along with either a VGG perceptual loss (WGAN-VGG) or mean squared error loss (WGAN-MSE). The inclusion of GAN-based methods is representative of the state-of-the-art methods, and as noted below, a limitation of our method is that it may not perform as well on image quality metrics but perform better on perceptual quality metrics and downstream task performance.

(Rev 4) Limitations of the approach: CTFlow is highly dependent on tuning the variance parameter depending on the CT scan and task. Using a fixed variance parameter, CTFlow does not achieve the best image quality using metrics such as peak signal-to-noise ratio and structural similarity metric. This study only focused on mitigating the effect of a single CT parameter, either dose (in our image quality experiment) or kernel (in nodule detection). In the real-world, multiple CT parameters interact (dose AND kernel); these more complex interactions are being investigated as part of future work.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors have provided a proper rebuttal to the questions of the reviewer. Although doubts persist about the relevance of the approach and its evaluation using downstream tasks, I’m willing to accept the paper based on its merits as listed by the reviewers, including its novelty and direct comparison with GANs.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Interesting paper and concept, but a clear demonstration of multiple outputs and its applications needs to be added.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    There is a clear application novelty here in bringing the generative flow techniques for reducing the variations in the CT scans between low dose and regular dose CTs. Methods to harmonize and reduce imaging intensity differences between imaging scans are welcome and needed for reproducible radiomics studies. The experiments are also informative and evaluation on public datasets indicates potential of the approach for improving image quality. Authors mostly responded to reviewers’ critiques. However, it would strengthen the paper to clarify what the strengths and limitations are as well as how it differs from normalizing flows method in the discussion.



back to top