Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Jiahao Huang, Xiaodan Xing, Zhifan Gao, Guang Yang

Abstract

Fast MRI aims to reconstruct a high fidelity image from partially observed measurements. Exuberant development in fast MRI using deep learning has been witnessed recently. Meanwhile, novel deep learning paradigms, e.g., Transformer based models, are fast-growing in natural language processing and promptly developed for computer vision and medical image analysis due to their prominent performance. Nevertheless, due to the complexity of the Transformer, the application of fast MRI may not be straightforward. The main obstacle is the computational cost of the self-attention layer, which is the core part of the Transformer, can be expensive for high resolution MRI inputs. In this study, we propose a new Transformer architecture for solving fast MRI that coupled Shifted Windows Transformer with U-Net to reduce the network complexity. We incorporate deformable attention to construe the explainability of our reconstruction model. We empirically demonstrate that our method achieves consistently superior performance on the fast MRI task. Besides, compared to state-of-the-art Transformer models, our method has fewer network parameters while revealing explainability. The code is publicly available at https://github.com/ayanglab/SDAUT.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16446-0_51

SharedIt: https://rdcu.be/cVRTK

Link to the code repository

https://github.com/ayanglab/SDAUT

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

This paper proposed a SDAUT network that combines Swin Transformer and deformable attention for fast MRI which also provides explainability
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper is well-written and the idea is easy to follow overall
- The combination of Swin Transformer and deformable attention is a novel design and can help to reduce computation while maintaining/improving performance
- SDAUT achieves state-of-the-art performance and can provide explainability
- The authors provide an extensive ablation study to validate the design of the proposed method
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Supplementary material is not provided though mentioned in the main paper
- The values of model parameters are missing, e.g. L in RSDTB/RDTB blocks, r in deformable attention, the number of feature channels, and the window sizes for Swin attention computation
- In the ablation study, it is unclear how DDDDDD-O models perform, i.e. models using dense deformable attention only. I understand it might be computationally costly, but will it bring significant performance gain?
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Some model specifications are missing in the paper. But the authors have agreed to release codes and models in the checklist.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

Please refer to the weakness section. My main comments are: 1) How do DDDDDD-O models perform? This is important for understanding/ validating the necessity of combining Swin Transformer and deformable attention. 2) Attention score based explainability is not unique to this method but a property for all Transformer-based models. Have the authors checked how it compares to gradient-based model explanation methods?
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The method design is novel and effective, and explainability is important for medical applications. However, there are some weakness as described in weakness and comment sections.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

5
[Post rebuttal] Please justify your decision

Although there are some weakness of this paper as pointed out by other reviewers, I think this work has enough merits for accept and can be interesting for the image reconstruction community.

Review #2

Please describe the contribution of the paper

The authors solve the problem of accelerated MRI reconstruction from an undersampled k-space. They propose a UNet based on Shifted Windows (Swin) Transformer and deformable attention derived from deformable convolution networks.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The article is clearly written, the drawings are informative, the motivation is sound.

The resulting architecture is much more computationally efficient than classical feedforward neural networks, including those based on transformers.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The novelty is limited, given the fact that the authors combined the UNet with Transformers (SWIN) and deformable convolutions. As an example, U-Net on Transformers already exists: 1) Petit, O., Thome, N., Rambour, C., & Soler, L. (2021). U-Net Transformer: Self and Cross Attention for Medical Image Segmentation. ArXiv, abs/2103.06104. 2) Unet based on SWIN: Cao, Hu et al. “Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation.” ArXiv abs/2105.05537 (2021) 3) Gao, Yunhe et al. “UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation.” ArXiv abs/2107.00781 (2021): 4) Yeonghyeon Gu, Zhegao P., Seong J. Y. “STHarDNet: Swin Transformer with HarDNet for MRI Segmentation”, Applied Sciences 12 (1)468, 2022. DOI:10.3390/app12010468

Here is a nice collection of models for MRI acceleration using transformers: https://github.com/junyuchen245/Transformer_for_medical_image_analysis

There is no comparison with any of these methods in the experiments. In general, the authors compare only to the three basic approaches and SwinMR, with respect to which there is almost no improvement, except the reduction of the number of parameters. It would be important to compare with performance to, e.g., SwinUnet (second reference on the list) - also because of its efficient configuration.

Also, the manuscript presents only 1 dataset, on the topic of fast MRI where the the top benchmarks of Fast MRI challenge (https://fastmri.org/) are not considered.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Hard to assess.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

To strengthen the article, the experimental part needs to be refined by including proposed benchmarks and other transformer-based models. Explainability claims need to be elaborated too (or removed from the title).
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

3
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The manuscript is of an engineering nature where two already developed approaches (Swin + deformable attentions) were combined. The experimental part is not strong, with the main claimed achievement of the article being the computational efficiency. However, there is no comparison with the other transformer-based U-Nets which also have much fewer parameters than the SwinMR.
Also, only 1 dataset is considered in the article; the single-coil vs multi-coil setup is not addressed. Given these issues, the score is below the borderline threshold.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

5
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

4
[Post rebuttal] Please justify your decision

Additional experiments are appreciated and will make the work stronger. I increased my score appropriately to “4:weak reject”. However, the novelty and explainability still need to be justified.

Review #3

Please describe the contribution of the paper
1. The authors integrate Swin deformable transformer [22] with U-Net to propose SDAUT, which is applied to undersampled MRI reconstruction.
2. The authors attempt to provide explainability of the proposed methods on its superiority to the comparison algorithms by showing the deformation fields and attention score in the inference stage.
3. The proposed SDAUT outperforms nPIDD-GAN and SwinMR with lower computational cost. SDAUT betas DAGAN in SSIM, PSNR and FFID but fails MACs.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The deformable field and the attention score in Fig.4 attempted to provide insights of how the model works for the undersampled reconstruction task.
2. The swin deformable transformer is somehow novel in undersampled MRI reconstruction application.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. While the proposed SDAUT outperforms the competing algorithms in the paper, it lacks the comparison with state-of-the-art algorithms.
2. The authors claimed the proposed SDAUT provide explainability. It is not clearly written how the proposed methods/which mechanisms provide explainability. While the discussion & conclusion section shows an example of such claim, it lacks of explanation of ‘explainable fastMRI’ . The authors could define and clearly mention how does SDAUT is explainable.
3. The supplementary material (which has been mentioned multiple times in the manuscript) is missing.
4. Data consistency (DC) module plays an important role in undersampled MRI reconstruction. The proposed module does not contain DC layer, potentially leading to lower fidelity reconstructed images. Nor did the authors compare with the models that contain DC blocks, such as D5C5, variational network, MoDL etc.
5. While the writing is easy to follow, the motivation is not well-explained. e.g. How the proposed methods reduce computational cost? Why/How does it provide explainablity? Although the results section confirmed this, the introduction/methods part did not mention it, which makes the manuscript less convincing.
6. The data used in the manuscript is 12-channel, how does the authors combine the multi-coil images? Did you use sensitivity map?
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The reproducibility checklist shows the authors will upload their code and implementation details online. The experimental settings are listed in the manuscript, but the hyper-parameters for the proposed method are missing.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
1. I suggest the authors to write the motivations behind their designs more clear, especially in the methods parts. Why and how the design leads to performance gain? Which design reduces the computation cost? I can see the experimental results confirm the claims (good performance with limited computational cost + explainable fastMRI), but more insights of the design will make this paper more solid.
2. The most interesting part of this paper is the claim of “explainable fastMRI”, but it is not convincing enough about how the proposed method provides explainability. Detail explainations/motivations on this should be mentioned in the introduction & methods parts.
3. More comparison with SOTA methods, especially those including DC layers should be demonstrated. Also, it is interesting to see if the performance improves when including DC layers.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The manuscript is clear and easy to follow, but the motivation of the design is not well-explained. The performance improvement is quite marginal. Nevertheless, the application of deformable swin transformer to fastMRI and the results on deformation fields/attention score for explainble fastMRI are the merits.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

3
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

5
[Post rebuttal] Please justify your decision

The authors provide more experimental results to address the algorithms comparison problems raised by all reviewers. I suggested to include this in the final version.

I agree with the R2 on the novelty issues in the method, but the application to MRI reconstruction is somehow novel.

The rebuttal still not explain the “explainability” of the model, which is one of the contribution of this paper and questioned by R1&R3. Therefore, i still keep my score as a borderline accept.

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
This paper presents a UNet based on Shifted Windows (Swin) Transformer and deformable attention for accelerated MRI reconstruction from an undersampled k-space. As the reviewers agreed that the presented work is interesting, they raised several concerns, including
- The novelty of the work needs to be justified given the large amount of existing work. (R2)
- The proposed method was only compared with some weak baselines, but not the state-of-the art methods. (R2 & R3)
- Hyperparameter setting and sensitivity analysis are missing. (R1)
- The claim of “explainable fastMRI” lacks support. (R1 & R3)
- The top benchmarks of Fast MRI challenge (https://fastmri.org/) are not considered. (R2)
The authors are invited to provide a rebuttal.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

10

Author Feedback

We proposed a novel Swin deformable attention UNet transformer (SDAUT) for fast MRI, which achieved SOTA while significantly reducing the computational costs (R1,R2,R3) with explainability provided (R1,R3). Comprehensive ablation studies have validated the model design of SDAUT (R1). All reviewers agreed that our paper is “well-written” with “informative” results. The deformable field and the attention score maps “provided insights for model explainability for the reconstruction task” (R3).

Motivation and Novelty Our Swin deformable module is an effective “novel design” (R1, R3). However, R2 mentioned that our work lacks novelty but provides some references for segmentation to compare. The deformation attention mechanism was applied for the reconstruction since a deformation map can be learned; however, with huge computational cost and memory requirement. Thus, we proposed Swin deformable attention (sparse) for its spatial restriction reducing the computational costs, and UNet structure enabling dense deformable attention to be applied in the bottleneck. We believe this is novel in fast MRI.

Explainability We provided deformation fields and attention score maps for explainability, and R2 claimed that “explainability needs to be elaborated”. More descriptions and visual examples were provided in the Supplementary file, which was removed, and we now add more results to the revision. Besides, R1 mentioned that “SDAUT can provide explainability”. For deformation maps, the deformation points tend to move closer to the border of brains, proving that our proposed deformable attention can effectively capture the structure information. Attention score maps show that a fixed head always focuses on specific structures in MSA, revealing how the multi-head mechanism works. Transformers use a multi-head design to extract various high dimensional features.

Model Parameters and Dataset Processing Thanks for R1 and R3’s suggestion about “missing model parameters”. We did not include the model parameter in the main text due to page limits. Parameter details will be added in the camera-ready version: L=6; r=1; #channels=[90,180,360,720,360,180]; #heads=[6,12,24,24,24,12]; window size=8. Thanks for R3’s question about multi-coil data. We used root sum squared to convert the 12-channel data into single-channel data. We now add this information to the revision.

Experiments R2 claimed that the experiment is limited to comparison methods and datasets. To make our experiment more convincing, we now update the experiment results in the revision. As R2 suggested, we now add results of SwinUNet, which was originally proposed for segmentation. We applied its default network setting and use the SDAUT’s training setting for a fair comparison. Results on FastMRI are also included. Here is the updated value: For CC dataset G1D30% (Method/PSNR↑/SSIM↑/FID↓): SDAUT/33.92/0.963/20.45; SwinMR/33.06/0.956/21.03; SwinUNet/32.52/0.951/31.16. For FastMRI (Knee) dataset G1D30% (Method/PSNR↑/SSIM↑/FID↓): SDAUT/37.42/0.973/38.69; SwinMR/35.44/0.962/44.83; SwinUNet/37.02/0.971/37.95; nPIDDGAN/34.76/0.956/39.86; DAGAN/34.11/0.948/34.65.

Minor issues Thanks for R1’s suggestion about “DDDDDD-O”. We have tried it before but failed since it requires more GPU memory than the one we have. Thanks for R1’s suggestion about gradient-based explanation. Heatmaps from different heads were applied for the explanation of multi-head mechanism. However, gradient-based explanation shows how pixels influence the whole prediction, which might not be suitable here. Thanks for R3’s suggestion about DC layers. DC layers may provide data fidelity-guarantee but can lead to a lower perceptual score. We will consider and combine DC layers into our future model. We apologize for our over-length supplementary file, which was removed by the Chair after submission. Implementation details (dataset information and parameters) are now added to the revision. Grammar and writing have been checked.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The rebuttal has well clarified the major concerns raised by the reviewers in the previous review. While the reviewers still have reservations about the novelty and explainability, its overall quality is very good. This paper is worth of a spot for presenting at MICCAI.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

1

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper presents a fast MRI reconstruction network. The main contribution of the paper is to propose a module that combines deformable convolution and swin transformer and provides interpretable arguments. The reviewers raised major questions about the paper in terms of innovation, experimental design, and interpretability of the method. After the rebuttal, R1 and R3 indicated acceptance of the paper and R2 indicated weak rejection and expressed concerns about innovativeness and interpretability. AC considers the feedback letter as an effective response to the raised issues of experimental results and innovativeness and a weak response to interpretability. Overall, the contributions of this paper outweigh the weaknesses.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

3

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper proposed a transformer-based model for fast MRI reconstruction with deformable attention. The reviewers raised some concerns (novelty, explainability), which are partically addressed in the rebuttal. I think this work has the merits for a MICCAI paper.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

NA

back to top

Swin Deformable Attention U-Net Transformer (SDAUT) for Explainable Fast MRI