Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Lan Jiang, Ye Mao, Xiangfeng Wang, Xi Chen, Chao Li

Abstract

MRI synthesis promises to mitigate the challenge of missing MRI modality in clinical practice. Diffusion model has emerged as an effective technique for image synthesis by modelling complex and variable data distributions. However, most diffusion-based MRI synthesis models are using a single modality. As they operate in the original image domain, they are memory-intensive and less feasible for multi-modal synthesis. Moreover, they often fail to preserve the anatomical structure in MRI. Further, balancing the multiple conditions from multi-modal MRI inputs is crucial for multi-modal synthesis. Here, we propose the first diffusion-based multi-modality MRI synthesis model, namely Conditioned Latent Diffusion Model (CoLa-Diff). To reduce memory consumption, we perform the diffusion process in the latent space. We propose a novel network architecture, e.g., similar cooperative filtering, to solve the possible compression and noise in latent space. To better maintain the anatomical structure, brain region masks are introduced as the priors of density distributions to guide diffusion process. We further present auto-weight adaptation to employ multi-modal information effectively. Our experiments demonstrate that CoLa-Diff outperforms other state-of-the-art MRI synthesis methods, promising to serve as an effective tool for multi-modal MRI synthesis.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43999-5_38

SharedIt: https://rdcu.be/dnwwR

Link to the code repository

https://github.com/SeeMeInCrown/CoLa_Diff_MultiModal_MRI_Synthesis

Link to the dataset(s)

https://brain-development.org/ixi-dataset/


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a conditional latent diffusion model for multi-modal MRI image synthesis. The authors incorporate several mechanisms for addressing common problems in medical image generation. They include brain region masks for structural guidance and similar cooperative filtering for avoiding noise in the latent space. They show that their algorithm outperforms other image synthesis frameworks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The authors introduce several control mechanisms to the diffusion process such as structural guidance. The retention of the anatomical information is a problem specific to image generation in the medical domain, therefore these control mechanisms aid in the robustness of the framework in terms of keeping the important underlying morphology.

    • The results on table 1 demonstrate the performance of different combinations of modalities for the generative task. It is shown that the suggested framework consistently performs better compared to other generative frameworks.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The authors show results in two different brain datasets BRATS 2018 and IXI. The reviewer believes that inclusion of another body part rather than another brain dataset could’ve been useful in terms of showing the generalizability of the framework.

    • There are many components of the framework such as modified latent diffusion network, structural guidance, auto-weight adaptation and similar cooperative filtering. The ablation studies are shown in table 2 that demonstrates the effects of individual components. The reviewer believes that it might be challenging to bring all the components together and adapt them for a different dataset. Therefore the reviewer suggests addition of discussion on how to bring all these components together for a different dataset that covers different modalities and anatomies.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • The authors are using a public dataset and the hyper-parameters and training settings of the network is shared. These factors have a positive effect on the reproducibility of the paper.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • In structural guidance section the authors mention that they are using FSL-FAST tool to segment different types of brain tissue. The reviewer believes that a discussion regarding if this step can also be incorporated into the end to end training framework would be useful in terms of expanding the proposed framework to other anatomies where segmentation tools might not be readily available.

    • There are many components of the network as demonstrated in the ablation study in table 2. The reviewer believes that more elaboration can be made into how the hyper-parameters and architectural design choices were made and optimized to make the framework perform well. The paper can benefit from discussing how these different components can be assembled for a different anatomy or dataset.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper introduces several control mechanisms to the diffusion process that addresses the concerns that would arise in medical image generation in terms of anatomical structure preservation. The results are compared with other generative models popular in the literature and are shown to be superior.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    In this work, the authors address the issue of multi-modal MRI synthesis by introducing the Conditioned Latent Diffusion Model (CoLa-Diff), a diffusion-based multi-modality MRI synthesis model. CoLa-Diff operates in the latent space and employs a novel network architecture, brain region masks, and auto-weight adaptation to effectively utilize multi-modal information. The experiments demonstrate that CoLa-Diff outperforms existing state-of-the-art MRI synthesis methods, offering a promising solution for multi-modal MRI synthesis.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The key strength of the paper lies in the integration of the diffusion process within a latent space generated by an encoder, along with the incorporation of structural guidance, which is essential to prevent hallucination artifacts and alterations in anatomical structures. Additionally, the quantitative findings are supported by a thorough ablation study.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Unfortunately, although the authors utilize a generative approach, they do not illustrate the model’s ability to generate multiple variations of a result. Consequently, it remains unclear to what extent the model can hallucinate structures in the resulting images, leaving room for further exploration and assessment of the model’s capabilities in this regard..

    Moreover, the paper does not present the output of the model when no guidance is provided and only noise is used as input. This leaves unanswered questions about the approaches versatility and its behavior in the absence of guidance.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The requirements resulting from the answers to the checklist are met in the manuscript. Additionally the authors submitted their code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    To better understand the extent to which the model can generate hallucinated structures in the resulting images, it would be beneficial for the authors to provide multiple variations of a result. An extension of this work might further include applying the model to accelerated MRI using different sampling modalities, provided that the model also works without guidance.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The primary factors that influenced my decision were the well-structured presentation and the novelty of the approach presented in this paper. In particular, the incorporation of structural guidance and the integration of the diffusion process stand out as noteworthy ideas.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper presents a diffusion based method for multi-modal MRI synthesis, and several strategies for enhancing the synthesis quality and reducing the memory burden.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Reduces the memory burden by operating in the latent space;
    • Enhances the synthesis quality by using collaborating filtering and tissue masks
    • Learning the weights of the different conditioned modalities
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Learning in the latent space is the main idea of “Stable Diffusion” [1]
    • The proposed “Similar cooperative filtering” is highly similar to the classic denoising algorithm BM3D [2] that uses”Collaborative filtering”, which however is not cited. What’s the essential difference between these two?

    [1] Rombach, Robin, et al. “High-resolution image synthesis with latent diffusion models.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2022. [2] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse 3D transform-domain collaborative filtering,” IEEE Transactions on Image Processing, vol. 16, no. 8, pp. 2080-2095, Aug. 2007.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    It is claimed that the code will be released.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Citations are lacking for two of the claimed contributions (Sec 2.1).

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Citations are lacking for two of the claimed contributions (Sec 2.1), which makes it hard to evaluate the contributions.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    I changed my rating to positive as the authors did came up with corresponding solutions to the challenges encountered in practice. The authors did cited the latent diffusion paper, but only in the experimental section for comparison purpose. Still, I believe the more desired way of presentation should be first acknowledging the prior work in the method section, and then explain the real-world challenge followed by the proposed solution. Simlarly, the wording in the abstract is also confusing: it sounds like this work is the first that proposes to operate in the latent space with diffusion models.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper presents a diffusion model for multi-modal MR synthesis which has been met with generally positive reviews. While the diffusion model is currently popular in the field, the authors propose several novel strategies and demonstrate their effectiveness through ablation experiments. One reviewer, R2, does raise the point that the paper does not present the model’s output when no guidance is provided and only noise is used as input. Including such a discussion would be interesting. Moreover, considering the limited space of rebuttal, I suggest the authors can explain more about the real technical obstacles and solutions in this work. As implied by R3, the entire architecture is similar with stable diffusion to certain extent. So it is necessary to see the contributions in depth of this paper.




Author Feedback

R1Q1. Generalizability We appreciate the reviewer for the insightful suggestions. We tested our model on diverse anatomies (health and lesioned brains) and multiple MRI modalities (T1,T2…). The relatively consistent model performance could indicate the generalizability of CoLa-Diff. We agree that testing on other body parts could help further demonstrate generalizability. We recognize that implementing our model on another dataset may need fine-tuning and constructing the guidance specific to different anatomies. We will discuss this limitation in the final version.

R2Q1. Multiple variations of a result and hallucinated structures We appreciate the reviewer’s comment on illustrating multiple variations of a result. The error maps (Fig. 2) demonstrate the degree of structure hallucination of the generated images. Specifically, a darker color indicates a higher chance of hallucination artifacts. We will include the uncertainty estimation generated from a single image and present more visualized results in our final version. We will also discuss the synthesis variations and structure artifacts.

R2Q2. Output when no guidance and only noise input To test the usefulness of structural guidance for multi-modality synthesis, we performed experiments ablating the guidance and presented quantitative results (Table 2, line 2), which showed worse performance than CoLa-Diff, but still higher (PSNR 27.7542dB, SSIM 91.4865%) over competing models (MM-GAN, LDM, Table 1). Due to page limit, we did not present the visual results and will add in the final version.

R3Q1. Learning in the latent space We agree that performing diffusion process in latent space is initially proposed by Stable Diffusion, a variant of latent diffusion model (LDM). However, it is recognized that vanilla LDMs could challenge MRI synthesis, as “the use of LDMs can be questionable when high precision is required” (reference [1] by R3). Therefore, CoLa-Diff aims to optimize LDM tailored to enhance MRI multi-modal synthesis. Specifically, 1) Our bespoke architecture with residual-based blocks and fusion addresses the issue of poor precision in LDM resulting from excessive image compression. Our network achieved significant improvement (Table 2, line 1) in PSNR (1.2052dB) and SSIM (3.5773%). 2) To reduce undesired noise that affects LDM’s synthesis precision, we introduce Similar Cooperative Filtering (SCF) for frequency domain filtering. The effectiveness of SCF is demonstrated by the ablation studies (Table 2, line 4) with improvement in PSNR (0.3373dB) and SSIM (1.5457%). We will elaborate our motivation and the difference from LDM in our final version. We cited DDPM since we mainly followed the diffusion processes proposed in DDPM. We did also cite LDM (referred as ‘Stable Diffusion’ by the reviewer) in Sec 3.1 (reference 19). We thank the reviewer’s suggestion and will also cite it in Sec 2.1.

R3Q2. Difference between Similar Cooperative Filtering and BM3D We thank the reviewer for the comment. We recognize that BM3D is a collaborative filtering approach, which aims to reduce significant transform coefficients, i.e., using only one coefficient instead of n in a grouped block containing n fragments. Due to the complex operation, BM3D requires careful selection of 2D transform and hyperparameter settings. Therefore, BM3D could be less suitable for LDM, which already has a large parameter space and heavy experimental burden. In comparison, SCF is essentially a frequency domain filtering method and simply operates to find similar fragments and weighted average their pixel values, which yields a lower training cost. Additionally, the 3D transform of BM3D produces a sparse representation of true signal, which can worsen the existing information compression issues in LDM, while SCF operates exclusively in 2D space with little information compression. Despite the differences, we would acknowledge the contribution of BM3D and will cite and discuss in our final version.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors propose the incorporation of various control mechanisms, including structural guidance, to enhance the diffusion process. In the field of medical image generation, preserving anatomical details is a critical challenge. Hence, these control mechanisms play a important role in ensuring the framework’s robustness by effectively maintaining the essential underlying morphology.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors have partially addressed some of the reviewers concerns within the length limits. One of the reviewers upgraded his ranking from weak reject to weak accept – all three recommend accept.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper presents an multi-contrast MR synthesis method based on diffusion models. Reviewers indicated both some technical novelity and intersting results. Some of the reviewers concerns were addressed by the authors. Overall a paper with potential to have some interest to the MICCAI readership.



back to top