Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Jiheon Jeong, Ki Duk Kim, Yujin Nam, Kyungjin Cho, Jiseon Kang, Gil-Sun Hong, Namkug Kim

Abstract

Since the advent of generative models, deep learning-based methods for generating high-resolution, photorealistic 2D images have made significant successes. However, it is still difficult to create precise 3D image data with 12-bit depth used in clinical settings that capture the anatomy and pathology of CT and MRI scans. Using a score-based diffusion model, we propose a slice-based method that generates 3D images from previous 2D CT slices along the inferior direction. We call this method stochastic differential equations with adjacent slice-based conditional iterative inpainting (ASCII). We also propose an intensity calibration network (IC-Net) that adjusts the among slices intensity mismatch caused by 12-bit depth image generation. As a result, Frechet Inception Distance (FIDs) scores of FID-Ax, FID-Cor and FID-Sag of ASCII(2) with IC-Net were 14.993, 19.188 and 19.698, respectively. Anatomical continuity of the generated 3D image along the inferior direction was evaluated by an expert radiologist with more than 15 years of experience. In the analysis of eight anatomical structures, our method was evaluated to be continuous for seven of the structures.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43999-5_35

SharedIt: https://rdcu.be/dnwwO

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #2

  • Please describe the contribution of the paper

    The authors propose diffusion generative models for synthesis of high resolution 3D medical images. Unlike regular 3D models, the diffusion models are 2D and they synthesize CT slices based on an initial seed in an auto regressive manner. Furthermore, they propose an intensity calibration network for possible intensity mismatch among neighboring slices within synthesized 3d data.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The approach is novel in a way that it uses diffusion models to synthesize 3D volumes from 2D networks in an auto-regressive manner. Compare to traditional 2D models, these models synthesize 3D data which is typically the way medical are. Compared to 3D networks, they can be less data hungry (although the authors did not specify that point it’s an important aspect that they should note).

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Major: -Paper is difficult to follow at times. There are many statements which are difficult to follow or require justification. I’ll mention them in the minor points section. -Little to no details are mentioned about the competing methods. Were they also optmizied? How were their hyperparameters selected?

    Minor:

    • Pg 2 paragraph 2 line 2, why is having a Gaussian encoder as an auxiliary network bad?
    • Please briefly explain coefficient of variation.
    • Pg 3, 2nd last paragraph is very confusing. Why is sigma_max = 1348 theoretically plausible when data range is -1 to 1. Where did the number 1348 come from. I could not find in the cited paper. Please also check for mistakes. for instance it says CV was lowest when sigma_max was 68.
    • Pg 3, last paragraph. How are bone and air high-frequency, and parenchyma low frequency. It seems counter intuitive. Parenchyma has much more spatial variations. -Pg 4, section 3.2. What is K? -Pg 4, section 3.2. “In addition, we omit augmentation because the model itself might generate augmented images”. What does it mean
    • Pg 5, section 3.3. “It was noted that the intensity mismatch problem only occurs in whole range generation”. Please elaborate. How do you explain the results in table 1?
    • Eq 4. How was the range -0.7 to 1.3 selected? -What is slice to 3D VAE. Competing methods require much more explanation.
    • Were different networks trained for whole range and windowing range in Table 1? Or was the range adjusted while calculating metrics?

    Others: -Details about data (atleast sample number, data type, etc) should be moved to the main paper. -Radiological evaluation should be moved from supplementary to the main paper. It is perhaps one of the most important section, but it is missing from the main text.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors write “not applicable” for all points.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Overall idea is very good. It has potential. However, in the current state it is difficult to accept the paper. I have mentioned tow main reasons in the weaknesses of the paper section. 1 - Very few details are given about the competing methods. 2 - Paper is not easy to follow. Many parts are unclear or not justified. Currently, I am rating the paper only as a reject for now. If the authors can address the issues during rebuttal, I will change my ratings.

    In case the paper does not get accepted I would suggest the authors to submit it to DGM4MICCAI workshop.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I have mentioned tow main reasons in the weaknesses of the paper section. 1 - Very few details are given about the competing methods. 2 - Many parts are not clear.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    For me one of the main concerns was missing details and lack of clarity. The authors did clarify most of them. Also, the authors promised to modify their paper by adding explanations, and additional remarks. One thing I would like to point out is that radiological scores are much more important than FID. Please also consider while revising the manuscript. I am updating my scores.



Review #3

  • Please describe the contribution of the paper

    This paper is proposing a novel method called stochastic differential equations with adjacent slice-based conditional iterative inpainting (ASCII) that uses a score-based diffusion model and slice-based method to generate high-resolution 3D CT images with 12-bit depth. It achieves impressive Frechet Inception Distance (FID) scores, demonstrating its effectiveness in creating precise 3D image data that captures the anatomy and pathology of CT and MRI scans.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • ASCII uses a score-based diffusion model and slice-based method to address challenges in creating precise 3D image data that captures the anatomy and pathology of CT and MRI scans.
    • The approach is unique in its use of adjacent slice-based conditional iterative inpainting and intensity calibration network (IC-Net).
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Lack of comparison with other deep learning methods: The paper does not compare the proposed method with traditional methods for generating 3D CT images, which limits its ability to demonstrate its superiority over existing approaches.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The response of reproducibility checklist is marked as NA. However, I believe author/s should complete section 1 for the algorithm reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
                                                                                                                                                                                                                                                                                                   It would be beneficial to include more comparisons with traditional methods and deep learning based reconstruction works  to demonstrate how the proposed method improves upon existing approaches.
    
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall this paper would be valuable due to its novelty, effectiveness, and potential impact in clinical settings. The proposed method can generate high-quality 3D CT images with 12-bit depth, which has important implications for medical imaging.

  • Reviewer confidence

    Not confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    Nothing to complain. The authors have properly addressed concerns in comparison with competing methods and will add additional experimental results.



Review #4

  • Please describe the contribution of the paper

    The paper demonstrates how to use diffusion model to generate whole range 12-bit depth 3D CT from adjacent 8-bit depth 2D slices. Specifically, the work generate whole range 12-bit depth data from 8-bit ones with a calibrated mechanism.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper targets a important question: 3D medical whole range data simulation, for downstream tasks.
    2. The paper describes the solutions clearly and with details.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. It would be better to briefly discuss the reason of using deep learning to calibrate intensity as methods like look up table may solve the problem as well.
    2. The paper may need to discuss briefly the reason Gaussian encoder is suitable for CT data generation.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The model architecture and the data are available and the training procedure is clearly discribed

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. Please clarify the meaning of σmin and σmax when they first appeared
    2. The physical “gap” between adjacent slices need to be clarified
    3. It is better to brief discribe the result/improvement in each figure.
    4. Generation in “3D” is not presented
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    clear discription on how to train the network while not present the result well.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper is about synthesizing high-resolution 3D CT images. The method is developed based on stochastic differential equations with adjacent slice-based conditional iterative inpainting and diffusion model. This work is interesting and the idea is good. The reviewers have the following concerns. Paper clarity needs improvement, e.g., some statements in the paper need rewriting or justification, the reasons for the use of deep learning for intensity calibration, and other comments can be found in the reviews. Regarding method evaluation, more details are expected for the methods in comparison, and it seems there is a lack of comparison with other deep learning methods.




Author Feedback

[Detailed descriptions] We first want to apologize for the insufficient details described in the manuscript due to page limits. We appreciate the reviewers’ careful attention to our work.

#1. As mentioned in the Section 3.1 [Technique 1] of cited paper [9], sigma max can enhance diversity. The value of 1,348 is a configuration derived from the official code of VESDE from the paper cited in [8], specifically designed for generating high-dimensional samples. Therefore, although a sigma max of 68 showed lower CV value, we chose a sigma max of 1,348 due to the diversity potentials, as discussed in the cited paper as well as the findings presented in the box plot of Fig. 1. #2. In Fig 2, we trained and generated images using the entire intensity range and measured metrics across the entire range for the experiments described in Tab 1. Also, the metrics measured within the windowing range for the images generated across the entire intensity range. #3. In general, edges of the image would have high frequency, and other textural regions would have low frequency. Bone (+1000HU) and air (-500HU) extreme intensity regions, while brain parenchyma shows subtle variations in the intensities (20~30HU). In the context of contrast and signal-to-noise ratio (SNR) in medical imaging, the term “frequency” used in manuscript should be replaced with image contrast and SNR. Therefore, bone and air regions mainly consist of strong contrast and SNR, but the parenchyma region mainly consist of low contrast and SNR. From this perspective, GAN models often fail to train low contrast and SNR regions because these regions are limited to a narrow margin of error. #4. Special structures such as Gaussian Encoders are usually needed to approximate the prior distribution of the generative model to the data distribution because this mapping is generally intractable. However, we did not directly mention that this structure is bad. #5. “K” represents the number of contiguous slices along the axial axis during both training and generation phases. #6. The CV is a measure used to compare variations while eliminating the influence of the mean. It is calculated by dividing the standard deviation by the mean. Therefore, we chose the CV as a quantitative metric for noisy-looking images, utilizing the characteristics of the diffusion model, which generates images with added noise. This metric allows us to capture the impact of noise in a more objective manner. #7. From the Eq. 4, range between μ~U[-0.7, 1.3] was selected due to the observed collapse of the important anatomical structures, such as brain parenchyma, beyond this range as shown in Fig. 5. #8. From the [Fig 5(a)] shown in paper “Differentiable Augmentation for Data-Efficient GAN Training”, generative networks can generate fake images with augmentation. Therefore, we omitted the augmentations in this study.

[Comparison with competing methods] #1. Slice-to-3D VAE is a 3D generation method which trains 2D VAE. By encoding each slice separately, 2D VAE can decode latents into images slice-by-slice and stack it to make 3D volume. Competing methods including StyleGAN2-ada, StyleGAN3, and slice-to-3D VAE, were implemented using their official codes and a series of experiments with hyperparameters to find optimized results for each method were conducted. However, we did not observe significant differences in the results. #2. We will conduct additional experiments using conventional image processing techniques such as histogram matching or lookup-table to compare the intensity calibration performance. However, these techniques have some drawbacks such as dependence on the reference and fitting in specific regions. We showed the qualitative results of histogram matching performance in Fig. 3.

In the final manuscript, we will incorporate the details, experiments, and additional remarks referenced in the rebuttal.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    I have read the comments and rebuttal. This paper is about synthesizing high-resolution 3D CT images. The method is developed based on stochastic differential equations with adjacent slice-based conditional iterative inpainting and diffusion model. This work is interesting, and the idea is good. Most of the concerns raised by the eviewers have been addressed satisfactorily in the rebuttal.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    All three reviewers now have unanimously given accept for this work and the rebuttal satisfied concerns from previous reviews.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Three reviewers summarised the weaknesses of the paper, mainly including unclear description of details and lack of comparison with other deep learning methods. The authors provided reasonable explanations in the rebuttal to address the reviewers’ questions, and both reviewers upgraded their scores. The final result shows that all three reviewers chose to accept the paper. Therefore, I recommend acceptance of this paper.



back to top