Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Xiaoxiao He, Chaowei Tan, Ligong Han, Bo Liu, Leon Axel, Kang Li, Dimitris N. Metaxas

Abstract

Accurate 3D cardiac reconstruction from cine magnetic resonance imaging (cMRI) is crucial for improved cardiovascular disease diagnosis and understanding of the heart’s motion. However, current cardiac MRI-based reconstruction technology used in clinical settings is 2D with limited through-plane resolution, resulting in low-quality reconstructed cardiac volumes. To better reconstruct 3D cardiac volumes from sparse 2D image stacks, we propose a morphology-guided diffusion model for 3D cardiac volume reconstruction, DMCVR, that synthesizes high-resolution 2D images and corresponding 3D reconstructed volumes. Our method outperforms previous approaches by conditioning the cardiac morphology on the generative model, eliminating the time-consuming iterative optimization process of the latent code, and improving generation quality. The learned latent spaces provide global semantics, local cardiac morphology and details of each 2D cMRI slice with highly interpretable value to reconstruct 3D cardiac shape. Our experiments show that DMCVR is highly effective in several aspects, such as 2D generation and 3D reconstruction performance. With DMCVR, we can produce high-resolution 3D cardiac MRI reconstructions, surpassing current techniques. Our proposed framework has great potential for improving the accuracy of cardiac disease diagnosis and treatment planning. Code can be accessed at https://github.com/hexiaoxiao-cs/DMCVR.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43990-2_13

SharedIt: https://rdcu.be/dnwLn

Link to the code repository

https://github.com/hexiaoxiao-cs/DMCVR

Link to the dataset(s)

https://www.ukbiobank.ac.uk


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents a morphology-guided diffusion model for 3D cardiac volume reconstruction, which uses high-resolution 2D images stacks to estimate corresponding 3D volumes. The authors claim that the learned latent spaces provide global semantics, local cardiac morphology, and details of each 2D cMRI slice for highly interpretable value to reconstruct the 3D cardiac shape. The results are compared with other networks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    I think the idea to use and Diffusion autoencoder to encode global high-level semantics into a descriptive vector and enforce it with a regional morphology encoder is kind of neat.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The description of the model itself and used loss functions could be improved as well as a clearer identification of the contribution of the paper.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Based on the information provided in the Reproducibility Response, it seems that the authors have taken steps to ensure the reproducibility of their research.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The paper’s contribution needs to be better identified. Additionally, the model description and loss functions used could benefit from clearer explanations.

    The description in the text does not match the information presented in Fig. 1 regarding the drawbacks of the imaging technique.

    The paper claims that most existing methods suffer from low generation quality, missing key cardiac structures, and long generation times. It would be helpful to provide references to support the claim.

    Figures 1 and 2 are somewhat repetitive, and some information is missing such as the learned segmentation. The paper could benefit from breaking down the pipeline into multiple figures and depicting the pipline and networks separately.

    The paper could use consistent terminology throughout regarding latent code, latent space, and latent variables. It might be helpful to define the term “latent code” as well.

    It is not clear where in the pipeline the complete MedFormer network is included and where the segmentation map is outputted/included in the reconstruction process.

    The data description in section 3.1 could be improved.

    Table 2 does not include the DiffAE, and it would be helpful to know why.

    It is not clear how the original image was segmented in Tab. 1, which provides an upper bound for other segmentation results.

    The paper claims that DMCVR outperforms all other methods in every metric, with an 8% increase in LVM segmentation. It would be helpful to provide more information/data to support this claim instead of just dropping a number.

    The arrow in Fig. 3 is not consistent over images.

    The paper claims that DMCVR generates images faster than DeepRecon, but it would be helpful to provide specific numbers/times.

    It is not clear what NN represents in Fig. 4, and the images could benefit from zooming in or cropping for better visualization.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper could benefit from improving the description of the model and its used loss functions, as well as providing a clearer identification of its contribution, while also acknowledging the innovative approach that should be shared with the community.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors introduced a morphology-guided diffusion model for 3D cardiac volume reconstruction. In particular, their generative model was conditioned on cardiac morphology, which improved the reconstruction efficiency and performance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Based on a diffusion autoencoder[1], in addition to the original semantic latent variable, the authors integrated a morphology latent code into the generative process to improve reconstruction efficiency and accuracy.
    [1]Preechakul, K., Chatthee, N., Wizadwongsa, S., Suwajanakorn, S.: Diffusion autoencoders: Toward a meaningful and decodable representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10619–10629 (2022)

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The proposed DMCVR is based on diffusion autoencoder (DiffAE)[1] and the authors have adapted it to 3D cardiac volume reconstruction task. However, many quantitative comparisons between the proposed DMCVR and the DiffAE are missing:
      • Image similarity (Section 3.2): PSNR and SSIM scores were only reported using DeepRecon and DMCVR.
      • Reconstruction evaluation (Table 2): No dice score was reported using DiffAE. Without detailed comparison between the proposed method with DiffAE, it’s difficult to evaluate the novelty of the proposed method.
    2. This paper shares a similar dataset setting and evaluation procedure as the prior work DeepRecon[2]. However, it does not demonstrate motion adaptation examples as seen in DeepRecon[2], as neither the submission paper nor the supplementary material mention them.

    [1] Preechakul, K., Chatthee, N., Wizadwongsa, S., Suwajanakorn, S.: Diffusion autoencoders: Toward a meaningful and decodable representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10619–10629 (2022) [2] Chang, Q., Yan, Z., Zhou, M., Liu, D., Sawalha, K., Ye, M., Zhangli, Q., Kanski, M., Al’Aref, S., Axel, L., et al.: Deeprecon: Joint 2d cardiac segmentation and 3d volume reconstruction via a structure-specific generative method. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2022: 25th International Conference, Singapore, September 18–22, 2022, Proceedings, Part IV. pp. 567–577. Springer (2022)

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    While the dataset used in this work is not publicly accessible, the codes can be re-implemented using the Github repositories of DiffAE. The work appears to be reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. There exists a discrepancy in segmentation metrics between their work and DeepRecon. While they compared their work with DeepRecon in terms of PSNR score, SSIM score (Section 3.2), segmentation metrics (Table.1), and 3D reconstruction Dice score (Table.2), all the evaluation values are exactly the same as those reported in DeepRecon[1], except for the segmentation metrics, which are much lower than those reported in [1]. Therefore, the authors should provide an explanation for this discrepancy.
    2. The authors claimed in the introduction that their method has improved efficiency, but they did not provide any evidence or discussion in the rest of the paper. Therefore, they should address this issue in their rebuttal and provide more details to support their claim. [1] Chang, Q., Yan, Z., Zhou, M., Liu, D., Sawalha, K., Ye, M., Zhangli, Q., Kanski, M., Al’Aref, S., Axel, L., et al.: Deeprecon: Joint 2d cardiac segmentation and 3d volume reconstruction via a structure-specific generative method. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2022: 25th International Conference, Singapore, September 18–22, 2022, Proceedings, Part IV. pp. 567–577. Springer (2022)
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    First, the novelty of the proposed method is limited, as it is based entirely on the original DiffAE with the addition of a morphology latent code. Furthermore, the authors did not provide sufficient comparison results to demonstrate that their DMCVR outperforms DiffAE. Second, the results of DeepRecon are inconsistent with those reported in the original paper, as it is unlikely that the same method would produce exactly the same values on some metrics but entirely different values on others.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    I would like to thank the authors for their detailed feedback which have answered my concerns related to the evaluation process, especially for the discrepancy in DeepRecon results compared with the original paper. However, I will be more convinced if the authors could provide more evaluation results to compare their DMCVR with DiffAE, since their methodology is built upon DiffAE. Therefore, I upgrade my opinion to weak accept.



Review #3

  • Please describe the contribution of the paper

    One limitation of CMR data is its 2D fashion and low inter-slice resolution. In this paper, the authors develop a diffusion-based method to increase the inter-slice resolution by generating the missing slices between two original consecutive slices. They leverage semantic latent code as well as regional morphology latent code to enhance their performance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper presents the implementation of the diffusion model in addressing the CMR interslice resolution problem. This is the first time I have seen the diffusion model being used in this specific problem. The noise (xT in the paper) made by diffusion model can be treated as a stochastic latnent code, which is further used in the image generation.

    2. The authors use a segmentation network to encode the regional morphology as a latent code, which is further used in the conditional diffusion model. This enables the focus on the key regions of the CMR image (LV, RV, Myo) and should be considered as a major advantage of this method.

    3. The authors adapt the idea of latent space interpolation from StyleGAN to the task of generating missing slices between two original consecutive slices. This task itself can be considered as some kind of interpolation process. The classical way is to interpolate only based on the intensity data in the image. The authors expand this interpolation from image domain to the latent domain.

    4. Significant improvement in SSIM compared to the previously published method DeepRecon.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The paper’s Section 3.2, which focuses on the evaluation of the 2D slice generation quality, is confusing for readers. Specifically, the authors compare their generated images and segmentation performance on their images with the original/ground truth image, despite the fact that the original image from UK Biobank is in low resolution. As a result, it is unclear how they are able to compare their high-resolution generated missing slice sequence with the low-resolution ground truth.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Good.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • please address all my comments in weakness.

    • Please generate a new version of Figure 4 with the following changes: Firstly, ensure that all CMR images, including those in other figures, are zoomed in to display the heart area clearly, as it is the primary focus of this paper. Secondly, display the greyscale images using the same window level and window width to improve comparability. Lastly, add an additional column that depicts the overlap of SAXDMCVR label, which should be the fourth column, on the LAX original, which should be the fifth column. This will enable readers to assess whether the cardiac surface contours are accurately reconstructed

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper presents a deliberate way to implement the new diffusion technique to address a clinically-important problem in CMR data. The addition of regional morphology latent code is notable contribution, as it algins with the clinical use of CMR to study regional cardiac structures. However, the paper would benefit from a clearer explanation of its evaluation process.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    (MR, R3) Experimental Settings Clarification: The aim of our study is to enhance the resolution along the long axis (LAX) for 3D cine short axis (SAX) images in the UK-Biobank. In Sec. 3.2, we generated images with the same resolution as the original ones to assess the representativeness of the latent code and the quality of the generated images relative to the original ones. The results for improved LAX resolution are in Sec. 3.3.”

    I won’t change my scores since I don’t think the authors successfully answered my question. They said: “In Sec. 3.2, we generated images with the same resolution as the original ones”, but I thought their method is to generate image with INCREASED resolution (that’s the whole point of super-resolution).

    First, I’m asking about the image comparison (not segmentation performance in Table 2). Second, Looking at Fig 1, the original images in UK biobank is the model input, the output high-resolution images contains original images (dark green box) and interpolated images (light green box). I’m not sure whether the authors compare the dark green boxes with the model input (basically they need to downsample their output into the original resolution) or compare the light green boxes with the input. It doesn’t make sense to compare the dark green box with original images since they are the same images representing the same slice of the heart and a simple B-spline interpolation for super-resolution can make sure the SSIM = 1. Did the authors compare the light green boxes with the input, which are neighboring images with some similarities (so even for an ideal AI the SSIM still won’t be 1 but some high value)?




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper presents a superresolution approach which samples slices in CMR short axis images, which typically suffer from low resolution. The paper has received diverging reviews. Even though the overall given score is high, I think it is is necessary to clearify the evaluation of the method; therefore I vote for rebuttal. Key strength:

    • Novel conditional diffusion models are used with further methodological extensions Key weaknesses: Choise of Data Set and quantitative comparison to SOTA/unsupported claims:
    • Reviewer 3 pointed out that the authors perhaps compared their generated high-resolution image to the low-resolution samples from UK-biobank. This needs further clarification, since the validity of the evaluation depends on it
    • Reviewer 1 and 2 further pointed out major weakness w.r.t. quantitative results, which need to be addressed (segmentation metrics): Reconstruction evaluation (Table 2): No dice score was reported using DiffAE. The paper claims that DMCVR outperforms all other methods in every metric, but more data needs to be presented to support this claim *Authors should improve on the organization of the paper and the efficiency of their method in comparison to existing work should be presented in a clearer way




Author Feedback

We express our gratitude for the insightful comments from the reviewers (R1 to R3) and the meta reviewer (MR). Herein, we address the concerns raised regarding experimental settings, details, and evaluations. All revisions will be updated in the final submission. (MR, R3) Experimental Settings Clarification: The aim of our study is to enhance the resolution along the long axis (LAX) for 3D cine short axis (SAX) images in the UK-Biobank. In Sec. 3.2, we generated images with the same resolution as the original ones to assess the representativeness of the latent code and the quality of the generated images relative to the original ones. The results for improved LAX resolution are in Sec. 3.3. (MR, R1, R2) Reconstruction evaluation: Using DiffAE for 3D cardiac volume reconstruction is part of our contribution. The experimental results have shown that using both morphology and semantic latent code is more representative compared to DiffAE which only utilizes the semantic latent code. Our proposed DMCVR can generate image closer to the original. Thus, interpolating the semantic latent code by DiffAE will not improve the result compared to our DMCVR. Since our diffusion model is already better than DiffAE, we decide not to include the volume reconstruction results for DiffAE. (R1) Segmentation Network and losses: The segmentation network used in this work is MedFormer. Due to the limited pages, we cited as [4]. The loss function for the generative model is Eqn. 4. (R2) Tab 1, Experimental Settings: The authors in DeepRecon trained multiple segmentation networks on the generated data and tested the segmentation model quality on the original images. However, our evaluation method is to train a single segmentation on the original dataset and test on the generated images. This causes the discrepancy of DICE scores between our paper and DeepRecon. Our evaluation can reduce the effect of training on the fairness of the evaluation process. For image similarity metric, we adopt the results from DeepRecon since the result only relies on the generated image quality. We also followed the volume reconstruction process in DeepRecon to generate results accordingly in Sec 3.3 and compared with our DMCVR. (R1) Description of Fig 1: The Fig 1 part (a), we demonstrated one SAX slice and one 2ch LAX image. The drawback of cardiac cine MRI is that resolution along the LAX of the 3D images is low, which is indicated in the LAX image as the spaces between white lines. The purpose of our work is to improve this spatial resolution to the LAX image on the right. It can also be interpreted by generating the missing slices indicated as grey squares between the green slices. (R1) Missing cardiac structure, etc.: Our assertion about present techniques is based on generation quality comparison in Sec 3.2. The extensive generation overhead from DeepRecon, requiring 1k steps for latent code inversion, is absent in our method, thus saving considerable image generation time. (R1) Consistent terminology: We thank the reviewer for pointing out this and will unify the use of terms in the final version. (R1) Metric Outperform: The claim is drawn from Tab 1. The quality of generated images from DMCVR outperforms the other reconstruction methods both in image quality metrics and downstream application like segmentation. And the number 8% is calculated based on the values in Tab. 1. (R1, R2) Time Improvement: Our model is trained in 4.8 days versus 14 days for DeepRecon with 4*RTX 8000. We also ameliorated the expensive iterative optimization of the latent code for a single image, which drastically improves the inference time. (R1) NN in Fig 4 represents Nearest Neighbor. This will be updated in the final version. (R2) Motion Adaptation: We appreciate the reviewer’s observation. However, the focus of this paper is on addressing the issue of low LAX resolution. It doesn’t encompass motion adaptation. This aspect will be incorporated into our future journal articles.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The reviewers updated their scores and actively responded to the rebuttal. All reviewers now vote for acceptance. However, I have some concerns accepting the paper, since there are still some valid open points which were not clarified (as stated the reviewers: comparison to DiffAE and clarification of Ground truth).

    The authors did not clarify exactly how their evaluation strategy is. I double checked in the paper and I think that perhaps in their case, the SAX volume is spatiotemporally registered to the LAX 2D acquisition and then the reconstruction error is computed based on the sparse information given in the LAX 2D aquisition (Figure 4). It is mandatory that authors adjust the paper such that the reader understands what your actual Ground truth (GT) is that you compare against. Furthermore, it is worth mentioning that the GT is (perhaps) only 2D and covers a minor part of the heart at the intersection of SAX and LAX acquisition.

    I hope that authors will seriously improve on the description and presentation of their results.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This work proposes to integrate conditional diffusion models into the reconstruction task. The reviewers agree on its novelty, however, the concerns about experiments and evaluation were not sufficiently addressed in the rebuttal.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    After the author’s rebuttal, this paper has received consistent scores, with R2 upgrading the original recommendation to weak acceptance. While the authors have made to address most of the reviewers’ questions, this meta reviewer aligns with R2’s comments (after reviewing the rebuttal) regarding the need for evaluation results comparing their DMCVR with DiffAE. Despite the authors’ claim that their work is built upon and superior to DiffAE due to the introduced morphology, it remains crucial to ascertain the extent to which this “morphology” influences the accuracy of the experimental outcomes. Therefore, the authors are strongly encouraged to consider R2’s suggestion of including these comparative results in a revised version suitable for MICCAI publication.



back to top