Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yilmaz Korkmaz, Tolga Cukur, Vishal M. Patel

Abstract

Magnetic Resonance Imaging (MRI) produces excellent soft tissue contrast, albeit it is an inherently slow imaging modality. Promising deep learning methods have recently been proposed to reconstruct accelerated MRI scans. However, existing methods still suffer from various limitations regarding image fidelity, contextual sensitivity, and reliance on fully-sampled acquisitions for model training. To comprehensively address these limitations, we propose a novel self-supervised deep reconstruction model, named Self-Supervised Diffusion Reconstruction (SSDiffRecon). SSDiffRecon expresses a conditional diffusion process as an unrolled architecture that interleaves cross-attention transformers for reverse diffusion steps with data-consistency blocks for physics-driven processing. Unlike recent diffusion methods for MRI reconstruction, a self-supervision strategy is adopted to train SSDiffRecon using only undersampled k-space data. Comprehensive experiments on public brain MR datasets demonstrates the superiority of SSDiffRecon against state-of-the-art supervised, and self-supervised baselines in terms of reconstruction speed and quality. Implementation will be available at https://github.com/yilmazkorkmaz1/SSDiffRecon.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43999-5_47

SharedIt: https://rdcu.be/dnww0

Link to the code repository

https://github.com/yilmazkorkmaz1/ssdiffrecon

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper newly introduced a self-supervised deep reconstruction model called Self-Supervised Diffusion Reconstruction (SSDiffRecon), which adopted self-supervision strategy along with a diffusion model.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper combined existing diffusion model with cascade architecture along with self-supervised learning scheme, which seems to be novel in MR image reconstruction methodologies.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    While qualitative/quantitative analyses were conducted in the experiments, more supporting results would be needed to demonstrate/support the performance of the proposed method.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    This paper explained in detail about the implementation/datasets in the method; however, the open code that can reproduce the results are not provided yet.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • Please provide fastMRI results with T2 and FLAIR contrasts to see performance of the proposed method.
    • The ablation results only showed on fastMRI dataset. Please provide quantitative ablation studies with IXI dataset too.
    • How were the interim reconsturcted images (e.g., iter = 0, 1, …) in the cascaded architecture?
    • The data consistency layer implemented in this paper is based on coil-wise data consistency, which did not share coil information in this stage. Can this data consistency layer be applied/extended to conjugate gradient based data consistency layer to further improve the performance?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Based on the commented given above, the overall recommendation is weak accept.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    In this paper, authors propose SSDiffRecon, a self-supervised MRI reconstruction method based on diffusion models. In contrast to existing applications of diffusion models to MRI reconstruction, the paper combines self-supervised learning with diffusion model based MRI reconstruction. This is achieved by a novel self-supervised loss function for diffusion models, where the unrolled network is used for reverse diffusion. The unrolled network consists of cross-attention which improves performance over CNNs.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Novelty: The proposed method is novel:

    1) Using linear complexity cross-attention layers within unrolled networks is a novel contribution, and according to Table 1 improves performance over CNNs. 2) Replacing the network in reverse diffusion with an unrolled network is to the best of my knowledge proposed in this paper for the first time, which is a novel contribution. 3) The self-supervised loss formulation for diffusion models is novel.

    • Strong evaluation: There are many diffusion model based baselines as well as past self-supervised methods. The ablations are clear and answers many questions regarding the impact of each component in the method.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There are no clear weaknesses of the paper.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The results of the paper are reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • Although it is at the pre-print stage, there seems to be another recent paper proposing self-supervised learning for MRI reconstruction using diffusion models [1]. Authors could discuss how their method relates to the pre-print.

    [1] Cui, Zhuo-Xu, et al. “Self-score: Self-supervised learning on score-based models for mri reconstruction.” arXiv preprint arXiv:2209.00835 (2022).

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The major factors that led to my overall score for this paper are the combination of multiple novel contributions with strong evaluation. I believe publication of this paper would advance research on the use of diffusion models in MRI reconstruction.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposed a self-supervised MR reconstruction network which is diffusion model-based. It shows more accurate and faster performance compared to the baseline methods with only using undersampled data in the training.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper tries to solve the urgent problem: if no fully-sampled MR data is available, how to conduct the training and deliver meaningful results. Improved performance is demonstrated compared to the baseline methods in terms of speed and accuracy.
    2. The work is evaluated with 6 baseline methods and 2 dataset.
    3. Although some explanation is missing, the mentioned paragraphs and equations are written clearly with good organization.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Major:

    1. Most critically, a paper with high-similarity (Zhuo-Xu Cui et al,. Self-Score: Self-Supervised Learning on Score-Based Models for MRI Reconstruction. arXiv:2209.00835. 2022) which is also Diffusion model-based, self-supervised MR reconstruction, embedded physical model and tested on the same dataset, is not cited, discussed or compared in this paper.
    2. Limited improvement and novelty based on the baseline work DDPM (citation 19 in the proposed paper). The major improvements are 1) replacing U-Net with Transformers and 2) using only undersampled MR data to train the network. However, the achieved faster and more accurate performance can be contributed by simply using zero-filling image instead of pure noise sample in the inference. This strategy is not applied to the baseline methods and investigated in ablation study.
    3. The proposed work didn’t explain why it can work with self-supervised Recon strategy with only undersampled data. Is it because of using Transformers which is more suitable than U-Net in this case? Is it because of the applied inference strategy? This part is the paper’s core but it is not well discussed.

    Minor:

    1. In the Introduction the authors claimed the previous work (citation 24, 3, 4, 19, 5 in the paper) omitted physical constraints which is not true. They almost all involved data consistency terms (just like this proposed work) which can be regarded as the physical constraints. Further, they didn’t show “poor contextual sensitivity” as claimed in this paper. Even so, it is not clear if it is caused by using CNNs (no experiments are shown).
    2. The denoising network is termed with “unrolled”. However, “unrolled” is always connected with an iterative algorithm. It is not clear which iterative algorithm and which terms (DC or regularization terms or etc.) are unrolled.
    3. No training details for SSDiffRecon are given (see reproducibility of the paper) and no implementation details (training and network hyper-parameters) for the baseline methods are given.
    4. No statistics (e.g. standard deviation etc.) is given in the experiments.
    5. Two beta terms are used (one in Background as variance schedule and one in Unrolled denoising Blocks as kernel). Are they totally different or have some connection?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Although all items in reproducibility checklist are checked, how to reproduce the results are unclear.

    • How is the data undersampled? (which undersampling pattern)
    • How to reproduce the training? Batch-size, learning-rates, scheduler, epochs, the hyper-parameters used in the network (alpha and beta, heads of Transformers, hidden layer dimension, etc..) etc. are not given.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. More background study and more similar work comparison are appreciated (refer to main weaknesses).
    2. More reproduction details are appreciated (refer to reproducibility of the paper).
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The major weaknesses mentioned in the “main weaknesses of the paper” part lead to this decision. A comparison/discussion with the highly similar paper as mentioned above is expected.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    2

  • [Post rebuttal] Please justify your decision

    In short, the authors didn’t resolve my concerns well and dodged my questions. It makes me feel more suspicious about their work and leads me to make this decision.

    1. As I already mentioned in the main weaknesses of the paper (Major concerns 2 and 3), this work uses ZERO-FILLING images as input in inference. However for baselines (e.g. self-DDPM), they deliberately do NOT use this strategy but use the pure-noised images as input. This makes the comparison unfair and it’s unclear if the benefits are gained from the proposed architecture or the simple usage of zero-filling images as input.

    The authors didn’t address this concern at all and didn’t provide any further information about this ablation study (using zero-filling images as input for baselines too). This raises my concern more and makes me feel they want to hide this problem.

    1. The authors are not aware of the paper that is published on 02.09.2022, which is half-year before the submission of this current work. It’s hard to call that paper concurrent work IMHO. Further, a simple search on Google with the keywords “self-supervised, diffusion model, MRI reconstruction’ can find that paper.
    2. The authors should provide the implementation details of the proposed work and baseline methods now. A promise cannot be testified.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The work intends to address an important challenge of MRI reconstruction when fully-sampled MR data is not available. The proposed self-supervised approach in combination with diffusion model is interesting and has been well validated. However, there are some concerns on the novelty and improvement of the work. The authors are invited for a rebuttal to clarify on the novelty of the work in comparison to the baseline work (DDPM [19]) and a preprint work highlighted by two reviewers. The reason for improved performance of the approach and a discussion on the core of the work as suggested by the reviewer also need to be clarified.




Author Feedback

We thank the reviewers for their valuable comments. In what follows, we address the concerns of the reviewers. 1- Meta Reviewer, R2Q9, R3Q6 - Significant distinctions between our method and the mentioned preprint ([A] arxiv:2209.00835): Please note that [A] is an unpublished concurrent work. We were not aware of this work during the submission of our paper. We will certainly discuss and cite this paper in the revised version. Our method offers several advantages over [A]. One notable advantage is the parameter efficiency of our model. While [A] employs a score model (NCSNv2¹) with over 80 million parameters, our model achieves comparable results with approximately 3 million parameters. Moreover, NCSNv2¹ requires thousands of sampling iterations while, in contrast, our model only needs five reverse diffusion steps, allowing for faster inference without sacrificing performance. Another important distinction lies in the architectural choices employed. Our model utilizes cross-attention transformers, which enables improved understanding of global dependencies within the data. In contrast, [A] relies on separate convolution branches. The utilization of transformers in our model showcases a novel and effective method for capturing long-range dependencies, contributing to its superior performance as shown in ablation studies. Apart from these points, [A] attempts to estimate the fully-sampled data distribution using a Bayesian network to train the score model which may increase the propagated error from distribution estimation to the conditional sampling. However, our model undergoes a unified end-to-end training process instead of distinct training sessions.

2- Meta Reviewer, R3Q6 - Regarding the comparison with DDPM[19]: One of the main differences between the two papers is the employed architectural choice. [19] incorporates a UNET architecture, which results in a higher number of parameters. This increased parameter count may allow [19] to capture more complex and nuanced information, however it is limited due to the lack of physical model incorporation and fully-sampled data needs during training. Our model compensates for this by leveraging other design choices, such as the use of lightweight cross-attention transformers to increase long-range contextual sensitivity and enforcing data-consistency throughout the denoising network. We also parametrize time indexes through a Mapper network and let time information flow along with a cross-attention mechanism which is a significant design difference with the [19]. Furthermore, it is stated in [19] that the mentioned reconstruction model requires at least a hundred reverse diffusion steps for a reasonable performance, although we used thousand steps to further improve its performance, our method achieves comparable results with just five steps. This reduced number of steps significantly reduces the computational burden associated with the diffusion process, making our method more efficient in terms of both time and computational resources.

3- Meta Reviewer, R3Q6 - The reasons behind our model’s performance: Our model follows a similar methodology with the successful model-based approaches like MoDL[1] and D5C5[21]. We aim to estimate the actual fully-sampled target directly during inference, as opposed to merely estimating the error in the image like in previous diffusion based approaches. As have been investigated in the ablation studies, most of the performance gain is directly related with the enforced data-consistency throughout the denoising network which allows us to use a lightweight backbone that has been detailed in the paper and previous questions.

4-R1Q8, R3Q8- Reproducibility: We will certainly publish the implementation codes and include more hyper-parameter details in the revised version.

¹Song, Y., & Ermon, S. (2020). Improved techniques for training score-based generative models. Advances in neural information processing systems, 33, 12438-12448.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors have pointed out a few differences between the proposed approach and the other similar work in the rebuttal. The points highlighted in the rebuttal sound fine and reasonable, and showed some differences against the similar work. The core technical differences against both DDPM and the other work could be discussed more. However, there are still some concerns on the fair comparison against baseline approaches as pointed out by R3, and the comparison against D5C5 is also not fair enough considering their large model capacity differences. Though the drawbacks exist, the method itself can still be interesting to the community which offers a fast diffusion model for self-supervised MRI reconstruction. This is indeed a borderline work considering both the strength and weakness of the paper, but I would tend to recommend accept considering the method’s potential interest to the readers.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper introduces a self-supervised undersampled MRI reconstruction network, with the main novelty being using diffusion models with transfomers. The authors have made efforts to address the concerns raised by the reviewers in their rebuttal. However, it is important to note that the paper is still considered borderline and its acceptance to MICCAI would be considered as a last resort.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Overall, this is a good paper: (i) important topic, (ii) relatively well-written, and (iii) relatively good comparisons.

    I agree with Reviewer 3, in that novelty is very limited.

    I recommend accept, but this paper is in the gray zone, so I am fine with a reject decision if the primary AC decides.



back to top