Authors

Boah Kim, Jong Chul Ye

Abstract

Temporal volume images with 3D+t (4D) information are often used in medical imaging to statistically analyze temporal dynamics or capture disease progression. Although deep-learning-based generative models for natural images have been extensively studied, approaches for temporal medical image generation such as 4D cardiac volume data are limited. In this work, we present a novel deep learning model that generates intermediate temporal volumes between source and target volumes. Specifically, we propose a diffusion deformable model (DDM) by adapting the denoising diffusion probabilistic model that has recently been widely investigated for realistic image generation. Our proposed DDM is composed of the diffusion and the deformation modules so that DDM can learn spatial deformation information between the source and target volumes and provide a latent code for generating intermediate frames along a geodesic path. Once our model is trained, the latent code estimated from the diffusion module is simply interpolated and fed into the deformation module, which enables DDM to generate temporal frames along the continuous trajectory while preserving the topology of the source image. We demonstrate the proposed method with the 4D cardiac MR image generation between the diastolic and systolic phases for each subject. Compared to the existing deformation methods, our DDM achieves high performance on temporal volume generation.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16431-6_51

SharedIt: https://rdcu.be/cVD67

Link to the code repository

https://github.com/torchDDM/DDM

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

This paper proposed a novel 4D image generation framework by adapting the denoising diffusion probabilistic model (DDPM) to the deformable registration model. The proposed method learns the distribution of the source and target and estimates the latent code to generate deformed images along the continuous trajectory. Experimental results on 4D cardiac MR image generation verify that the proposed method produces dynamic deformations from the end diastolic to systolic phase volumes, and outperforms the existing registration-based models.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- generated realistic deformed images along the continuous trajectory in a 4D cardiac MRI generation;
- proposed a diffusion deformable model for 4D medical image generation, which employs the denoising diffusion model to estimate the latent code;
- by simply scaling the latent code, the proposed model provides non-rigid continuous deformation of the source image toward the target.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

This paper proposed a diffusion deformable model (DDM), which can generate images of continuous trajectories with latent code. The result is comparable with state-of-the-art image registration tool voxelmorph in terms of PSNR and DICE score. However, no down streaming tasks are carried out to evaluate the effectiveness of the proposed images, so we cannot really draw conclusions about the quality of the proposed images.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors provided code of the experiment in an anonymous github account.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
- As mentioned above, it would be interesting to do at least one down streaming task after generating 4D images, such as analyzing the cycle consistency of the cardiac volume, to show that the advantages of generated images compared to registration.
- In table 1, it is hard to tell if the proposed method outperform the other two types of voxelmorph or not. So including a statistical test and a significance score would be appreciated here.
- In Figure 5 the authors compared the results of different /lambda values in the loss function, and picked an optimal one. It would be interesting to include two extreme cases - when there is only the diffusion loss or only the deformation loss in the model, and see the functionality of each individual loss.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I think it is very novel to adapt DDPM to image generation, especially generating continuous 4D data for cardiac cycle analysis.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

2
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

This paper proposed a novel deep learning based model to generate the intermediate temporal 3D+t cardiac MRI image. This model is a combination of diffusion probabilistic module and deformation module which both are implemented with 3D U-Net. Their implementation is adopted from a PyTorch version of these two modules from [10] and [3].
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The method utilizes a diffusion deformable model adapted from the denoising diffusion probabilistic model and can learn the spatial deformation along a geodesic path. As I know, this is the first time to use the denoising diffusion probabilistic model in this 4D data generation with a trick of deformation module. The experimental results on ACDC dataset verified that the performance of the proposed DDM outperforms the registration based method. These experiments covered most parts for evaluation of the registration performance.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The reference style should be consistent.
2. Typos like “deiffusion” in “Then, for the reverse deiffusion, DDPM learns the following parameterized Gaussian transformations”
3. It is very hard to visually check the difference between the proposed method and VM. From the Fig.4. and Table 1, their difference is still very tiny. The statistic significance from paired t-test is not enough, I would like to have a test of Wilcoxon signed rank test.
4. The difference of the performance on training data and testing data is not given in the experiments.
5. The PyTorch code provided is quite difficult to follow and verify some details, like x0 in Fig. 2 is not described in the code and I could not find theire implementation.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Their code is ok, but it is a little difficult to follow some details.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

See the above comments.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper is sound and well-written.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

This paper proposes a method for 4D cardiac image interpolation. The method uses a generative model based on denoising diffusion probabilistic models. The loss function combines a loss based on the diffusion model with a loss based on the deformable model. The model learns a code in the latent space that can provide the interpolation path between the source and the target images by a scaling sampling of interval [0,1]. The method has been evaluated in th ACDC dataset. The evaluation shows that the method provides plausible interpolation between the diastolic and systolic phases.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The proposal of a diffusion probabilistic model for deformable image generation is a very interesting idea.

The experiments show that the proposed method provides an interpolation that may be plausible between the diastolic and systolic phases of the cardiac cycle.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The accuracy of the interpolation method is difficult to validate.

The authors have shown the superiority of the proposed method with respect to VoxelMorph, however, this is probably not the closest competing method methodologically speaking and/or competitive enough for this application.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The datasets used for evaluation are publicly available, although the exact images used for training, validation, and testing are not provided.

The source code is available in a github repo. I did not check how does it run.

The parameter values were provided.

For the evaluation, the authors provided a clear description of metrics. Statistical significance is stated.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

I found the idea of using diffusion models as a generative model for the interpolation of the cardiac cycle really interesting. The authors provided promising results. The proposed method could be used for interpolation in datasets with missing images or in applications where a more continuous temporal variation of the images are needed. I have just two comments I would be happy if the authors consider to address in the manuscript.

1) My first question is, did the authors implemented also a generative model using GANs? The authors claim in the introduction that GANs may generate artificial features, but do they have experience for the application? What about other ways of obtaining a latent space or generative models not based on deep-learning?

2) My second question is on the selection of VoxelMorph as a baseline for the evaluation. Are there any evidence that this could be a competitive method for the application? What about alternative methods based on traditional image registration such as geodesic regression or EPDiff based time-dependent interpolation based on LDDMM?
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

My recommendation is accept because I believe that the proposed method is a nice methodological approach to solve the problem. My main concern is on the evaluation of the generated interpolation. The method is very difficult to evaluate due to the lack of ground truth. I believe that the authors performed reasonable experiments, although the selection of VoxelMorph as baseline should be further justified. I personally think that other methods could be more interesting and help for a better assessment of the proposed method.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

2
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

All reviewers recommended accept to this paper with positive comments on (i) the interesting and novel idea of integrating diffusion and deformation model for continuous image generation, and (ii) a well written and organized manuscript. The authors are strongly encouraged to incorporate all reviewers’ constructive feedback carefully into a revised version.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

1

Author Feedback

We appreciate all valuable comments from reviewers and meta-reviewers. In this response, we conducted additional evaluations that reviewers suggested, and addressed several concerns as follows. R1 mentioned that an evaluation of the effectiveness of our generated images compared to image registration methods is needed. R1 is kindly reminded that the proposed method is also based on image registration. Specifically, our model produces temporal images by warping the source image using the deformation fields as done in registration methods, the generated images by our method preserve the topology as in the image registration methods. R1 asked to include the ablation study of only using the diffusion loss or deformation loss. However, since the diffusion loss is computed using the diffusion module output that is not fed into the deformation module, only using diffusion loss cannot update the deformation module. Thus, we performed an ablation study that trains our whole model only using the deformation loss. As a result, the PSNR, NMSE, and Dice scores were 30.720, 0.473x(10^(-8)), and 0.799, respectively, which were lower than the optimal results of our method. This indicates that the diffusion loss is effective to learn the latent code for generating temporal images. R1 and R2 commented that it is hard to compare the difference between the proposed method and other methods. Per R1’s suggestion, we will include the results of the statistical test and p-values in Table 1. In addition, according to R2’s comment, we evaluated the statistical significance through Wilcoxon signed rank test, and observed similar results with the paired t-test. Specifically, our model outperforms the VM on all metrics with p-values<0.05. Compared to the VM-diff, there was no significant difference in Dice score, but achieves significant improvement in PSNR and NMSE with p-values<0.005. We will also add these results in Table 1. R2 asked the difference in performance between training data and test data. Accordingly, when we tested our model on training data, the proposed method generates temporal images with similar gains to the results on test data for all metric values. Specifically, compared to the initial values of PSNR of 29.683 dB and NMSE of 0.690x(10^(-8)), our model achieves 32.788 dB and 0.354x(10^(-8)), respectively. Also, our model produces the average Dice score of the segmentation maps at the end-systolic cardiac structures of 0.830 with a 0.13 gain over the initial score of 0.700. We will add these results in the final version. R2 mentioned that the provided PyTorch code is difficult to follow due to the lack of description. Accordingly, we have revised the code. Additionally, R2 is kindly reminded that we provided the code for inference (Fig. 2) as well as the training code. R3 asked whether we implemented a generative model using GANs for our application. Although generative models such as GAN can provide temporal medical image generation, they may produce undesirable spurious features. Note that when generating intermediate images between the source and target, it is important to only have components in the source with topology preservation. This is why we developed a deformation-based model that generates images by warping the source image. We will add this explanation in the final version. R3 commented on the validity to use VoxelMorph as a baseline for the evaluation. R3 is kindly reminded that we compared the proposed method to VM-diff as well as VoxelMorph (VM). The VM-diff is a learning-based diffeomorphic registration method based on a stationary velocity field representation. This provides intermediate deformations along the geodesic path by integrating the velocity field over the time t=[0, 1]. Also, since the traditional image registration methods require substantial time and expensive computational cost, we compared ours to learning-based registration methods. We will revise the reference style and typos in the final version.

back to top

Diffusion Deformable Model for 4D Temporal Medical Image Generation