Authors

Guangyuan Li, Jun Lyu, Chengyan Wang, Qi Dou, Jing Qin

Abstract

Current multi-contrast MRI super-resolution (SR) methods often harness convolutional neural networks (CNNs) for feature extraction and fusion. However, existing models have some shortcomings that prohibit them from producing more satisfactory results. First, during the feature extraction, some high-frequency details in the images are lost, resulting in blurring boundaries in the reconstructed images, which may impede the following diagnosis and treatment. Second, the perceptual field of the convolution kernel is limited, making the networks difficult to capture long-range/non-local features. Third, most of these models are solely driven by training data, neglecting prior knowledge about the correlations among different contrasts, which, once well leveraged, will effectively enhance the performance with limited training data. In this paper, we propose a novel model to synergize wavelet transforms with a new cross-attention transformer to comprehensively tackle these challenges; we call it WavTrans. Specifically, we harness one-level wavelet transformation to obtain the detail and approximation coefficients in the reference contrast MR images (Ref). While the approximation coefficients are applied to compress the low-frequency global information, the detail coefficients are utilized to represent the high-frequency local structure and texture information. Then, we propose a new residual cross-attention swin transformer to extract and fuse extracted features to establish long-distance dependencies between features and maximize the restoration of high-frequency information in Tar. In addition, a multi-residual fusion module is designed to fuse the high-frequency information in the upsampled Tar and the original Ref to ensure the restoration of detailed information. Extensive experiments demonstrate that WavTrans outperforms the SOTA methods by a considerable margin with upsampling factors of 2-fold and 4-fold.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16446-0_44

SharedIt: https://rdcu.be/cVRTD

Link to the code repository

https://github.com/XAIMI-Lab/WavTrans

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

This paper proposed a joint wavelet and residual cross-attention transformer network for multi-contrast SR reconstruction which tried to restore high-frequency information in the target image with the help of high-resolution reference image. This method is evaluated on two datasets derived from the in-house datasets brain and public dataset fastMRI knee. Comparisons with several previous methods also prove the effectiveness of proposed method.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Technical contribution. While the proposed method builds on existing building blocks, I think it is sufficiently novel.
2. Extensive comparison with chosen state-of-the-arts methods
3. Extensive ablation study demonstrating the advantage of the proposed contributions
4. Code has been provided in the supplementary material
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

No weakness, but a minor suggestion, I hope the authors can add another ablation study to discuss the importance of K-space data consistency loss.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

This paper is highly likely to reproduce as the authors have provided the code in the supplementary material.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
Detailed comments:
1. What’s the model size of proposed method? And what’s the inference time compared with other methods.
2. The architecture illustrated in Fig. 1 can be optimized, it’s a little bit confusing as there are too many lines in the figure.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

7
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The idea of this paper is technically sound and reasonable. Specifically, a joint wavelet and residual cross-attention swin transformer network is proposed for multi-contrast MR SR problem. Comprehensive experiments including comparison with other state-of-the-art methods and ablation study show the effectiveness of the proposed method. They also evaluate their method in 2 different datasets, one is in-house brain dataset, the other is public fastMRI knee dataset, which show the generalizability of the proposed method.
Number of papers in your stack

5
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

The paper presents a novely way of synergizing wavelet with swin transformer to obtain super-resolution images from multi-contrast MRI. The ideas proposed are interesting and the obtained results have been compared to various recent methods with an improvement over all of them.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper presents a novely way of synergizing wavelet with swin transformer to obtain super-resolution images from multi-contrast MRI. The ideas proposed are interesting and the obtained results have been compared to various recent methods with an improvement over all of them.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The paper indicates to use wavelet packet decomposition but it seems that it hasn’t benefited from WPD but instead the simple wavelet transform. It needs to be clarified which one out of the two it was and why. It also used simple Haar wavelet, the choice of which wasn’t justified and may need to be explored.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

While the software details are provided, the paper uses in-house data and non-open source code which will likely make it hard/impossible to reproduce the results.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

proofread and fix any grammatical errors e.g. section 1 line 2 ‘is able to provides’; page 2 line 3 ‘To the end’; page 2 line 6 ‘MR images quality’ -> ‘MR image quality’; 3rd last line before section 2 ‘a in-house dataset’;

fig 1 shows the WPD and the text in above sections indicate use of wavelet packet decomposition but do you need wavelet packet decomposition or is it just wavelet transform which is being used? why wavelet packet and not simple wavelet transform? if wavelet packet, how is it being used since the paper says we only use 1-level decomposition but up to 1-level there is no difference among the WT and WPD? can you clairfy this in the text

fig 2, not clear what each row represents; further, the columns need to be referenced with the relevant citation where the method is from another paper
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

strong methodology, good results
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

2
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #4

Please describe the contribution of the paper

Autorhs propose a novel WavTrans to synergize wavelet transforms with a new cross-attention transformer to tackle the challenges in super-resolution. The proposed method outperforms state-of-the-art methods by a considerable margin with UFs of both 2-fold and 4-fold.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The proposed WavTrans method outperformed the state-of-the-art multi-contrast SR reconstruction methods. This study provides a potential direction for further research into the processing between multi-contrast images for MRI super-resolution. The paper is well organized and the experiment is credible.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The transformer model is difficult to optimize, the proposed is implemented with 4 V100 GPUs, which is unpalatable for clinical application.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

This is paper is well organized, I think it is with good reproducibility.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

Why authors choose Swin Transformer as backbone? Please compare it with other transformer model (e.g. DeiT, PVT) and state your reason. Please show the parameter quantity and computational complexity in Table 1.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper is well organized.
Number of papers in your stack

7
What is the ranking of this paper in your review stack?

6
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This manuscript proposed a joint wavelet an transformer network to enable multi-contrast super resolution reconstruction on MRI. All reviewers agree this is a novel method for super resolution, with appropriate comparisons against the current state of the art demonstrating an improvement with their proposed approach.

There are some minor weaknesses, with one reviewer being confused if wavelet transforms or wavelet packet decomposition were used, and reviewers wanting the computational complexity/run time being presented in the manuscript.

I recommend accept as all reviews found this to be a strong paper, and the concerns are all minor related to description and presentation of the method and results.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

3

Author Feedback

We thank the reviewers and AC for their time and effort in reviewing our paper.

R1 suggests adding another ablation study to discuss the importance of K-space data consistency loss. The PSNR, SSIM, and RMSE (×10−2) values for w/o DC on the in-house brain dataset with 4-fold upsampling are as follows: 29.45(1.55)/0.87(0.02)/5.17(2.37). The results show that k-space data consistency loss can effectively restore the frequency domain information in MR images. We shall update the results in Table 2.

R1 and R4 ask for parameter quantity, computational complexity, and inference time of each method. The Parameters, FLOPs, and inference time of all mentioned models are listed below: EDSR: 1.518M / 16.275G / 7.17s; MCSR: 3.396M / 103.812G / 9.14s; MINet: 6.898M / 866.933G / 10.23s; MASA: 4.027M / 180.134G / 9.77s; Ours: 2.102M / 162.889G / 10.98s. We will show these in Table 1.

R1 mentions the architecture illustrated can be optimized. We will modify Fig. 1 in the revised paper.

R2 wonders if wavelet transforms or wavelet packet decomposition were used. Please refer to [1][2], our method is based on wavelet packet transformation, which decomposes a reference image into a sequence of wavelet coefficients of the same size. Inspired by [1][2], we choose the Haar wavelet, for it is enough to depict different-frequency reference information. Wavelet packet transformation decomposes the reference image into four wavelet coefficients {(A)LL, (V)LH, (H)HL, (D)HH}, where the approximation coefficients LL capture the low-frequency global information and the detail coefficients {LH, HL, HH } capture the high-frequency local information. In addition, if UF=4, we further decompose LL to obtain four wavelet coefficients for the next scale. (Figure 1 shows the model architecture under UF=2.) We will rewrite “wavelet packet decomposition “ to “wavelet packet transformation” and cite [1][2] in the revised paper. [1] Huang H, He R, Sun Z, et al. Wavelet-srnet: A wavelet-based cnn for multi-scale face super resolution[C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 1689-1697. [2] Qu L, Zhang Y, Wang S, et al. Synthesized 7T MRI from 3T MRI via deep learning in spatial and wavelet domains[J]. Medical image analysis, 2020, 62: 101663.

R2 mentions a few grammatical errors. We will correct the grammar errors in our revised manuscript and make more effort in proofreading to further enhance its readability.

R2 points out that it is unclear what each row represents in Fig. 2, and the columns need to be referenced with the relevant citation. In Fig. 2, the first two rows are the qualitative visualization results of the in-house brain, and the last two rows are the visualization results of the fastMRI knee. Specifically, except for the first column, the first and third rows are the SR reconstructed images obtained by different methods, and the second and fourth rows are the corresponding error maps. Besides, we will add the reference in each column in Figure2.

R4 mentions, “The transformer model is difficult to optimize, the proposed is implemented with 4 V100 GPUs, which is unpalatable for clinical application. ” We set the batch size to 2, and the proposed model can be optimized for training with a single V100 GPU(16GB).

R4 asks about why we chose Swin Transformer as the backbone. It is a great question. SwinIR [9] shows us the powerful performance of the swin transformer in image restoration. Therefore, we choose swin transformer as the backbone. In addition, Liu et al. (ICCV2021) showed that Swin outperforms DeiT on ImageNet-1K classification. It is a wonderful suggestion to use other transformer model for backbone, and we will try and compare in our future work.

Overall, we shall address most of these concerns in our final version and we think the proposed network makes an impactful contribution on a challenging topic that is of wide interest to MICCAI.

back to top

WavTrans: Synergizing Wavelet and Cross-Attention Transformer for Multi-Contrast MRI Super-resolution