List of Papers By topics Author List
Paper Info | Reviews | Meta-review | Author Feedback | Post-Rebuttal Meta-reviews |
Authors
Jiale Wang, Runze Wang, Rong Tao, Guoyan Zheng
Abstract
Deep learning-based single image super resolution(SISR) algorithms show great potential to recover high-resolution(HR) images
from low-resolution(LR) inputs. However, most studies require paired LR
and HR images to supervise training, which are not available in clinical
practice. In this paper, we propose an unsupervised arbitrary scale image
Deep learning-based single image super resolution (SISR) algorithms have great potential to recover high-resolution (HR) images from low-resolution (LR) inputs. However, most studies require paired LR and HR images for a supervised training, which are difficult to organize in clinical applications. In this paper, we propose an unsupervised arbitrary scale super-resolution reconstruction (UASSR) method based on disentangled representation learning, eliminating the requirement of paired images for training. Applying our method to applications of generating HR images with smaller slice spacing from LR images with larger slice spacing at the inference stage, we design a strategy to fuse multiple reconstructed HR images from different views to achieve better super-resolution (SR) result. We conduct experiments on one publicly available dataset including 507 MR images of the knee joint and an in-house dataset containing 130 CT images of the lower spine. Results from our comprehensive experiments demonstrate superior performance of UASSR over other state-of-the-art methods.
Link to paper
DOI: https://link.springer.com/chapter/10.1007/978-3-031-16446-0_43
SharedIt: https://rdcu.be/cVRTC
Link to the code repository
https://github.com/jialewang1/UASSR
Link to the dataset(s)
N/A
Reviews
Review #1
- Please describe the contribution of the paper
The paper presents a method to increase the resolution of a 3D medical image (an anisotropic set of slices). The method relies on a generative adversarial network that learns how to augment resolution without requiring a huge number of samples as other machine learning approaches.
The method is compared with several other methods of the literature using well known metrics.
Quantitative results show that the presented method achieves higher scores on most of the comparisons. Qualitative results show visually pleasant results that are similar to the ground truth.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
Using a GAN for single image SR is novel in medical imaging, as far as I know.
The results presented are comparable and potentially superior to the SOTA methods for SR.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
It is unclear in the experiment what “507 MR images” mean. These are 507 slices of the same exam? Or 507 exams of different knees? The same for the spine.
The quantitative assessment is input dependent. It is uncertain if the improvements will repeat with other data. The differences in the comparison with SOTA methods are small.
The qualitative evaluation is limited to visualizing one sample image and only in the authors’ opinion. There is no independent assessment by a population of experts. It is also uncertain how the method will perform on natively low resolution images or on increasing resolution of natively high resolution images.
The fusion part is unclear. Interpolation and SR can be applied in different orders to obtain arguably different results. It seems that for lack of space the authors did not detail that part.
The paragraph “ablation study” does not make sense for me. I do not see ablation there as my perception is that fusion is an extension and not part of the method. It is unclear in table 2 how the metrics are applied to “with fusion” and what “arbitrary” means in that context.
Limitations and applicability are not discussed.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
Part of the data used is publicly available. The methods are thoroughly explained in the paper, except for the fusion part.
Several items for which the authors responded yes are not included:
- A clear declaration of what software framework and version you used.
- A link to a downloadable version of the dataset (if public).
- Whether ethics approval was necessary for the data.
- Information on sensitivity regarding parameter changes.
- Details on how baseline methods were implemented and tuned.
- The details of train / validation / test splits.
- A description of results with central tendency (e.g. mean) and variation (e.g. error bars). -An analysis of statistical significance of reported differences in performance between methods.
- The average runtime for each result, or estimated energy cost.
- A description of the memory footprint.
- An analysis of situations in which the method failed.
- A description of the computing infrastructure used (hardware and software).
- Discussion of clinical significance.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
The paper is interesting and well presented. I will detail here the weaknesses to help authors understand what is not ok and why.
The dataset is not clear from the begining. At the end, “507 MR images” seem to be 507 complete exams of different knees, but in the beginning I though it was a single knee exam with 507 slices. The same for the spine.
The quantitative assessment should have at least a mean and sd so we understand that the data in table 1 is issued from a population of exams. Otherwise it looks like a single slice is being tested. If this is the case, the result is sample dependent and the contribution is very small.
Supposing that the experiment has a large population of exams and the data in table 1 is the mean, the differences in the comparison with SOTA methods are small. An analysis of variance would be necessary even to determine if the differences of the means are sgnificant.
The qualitative evaluation presented is limited to visualizing one sample image for each dataset and only in the authors’ opinion. There should be an independent assessment by a population of experts.
Here, it would be also interesting to see how the method will perform on natively low resolution images or on increasing resolution of natively high resolution images.
Finally, even if just for curiosity, it would be interesting to see how the method performs on general photographs.
The fusion part was the most unclear for me. After reading the paper again, I could understand that the initial goal is to increase the resolution of the whole volume, but that should be made clearer form the beginning. While understand that new slices can be made from interpolation and an axial stack can be computed from a sagittal stack, I would not call that “different views”. The term was misleading.
Moreover, interpolation and SR can be applied in different orders to obtain arguably different results. The pipeline in fig 2 indicates that the HR sagittal is built from the fusion of HR axial and HR coronal. To do so, LR axial and LR coronal are resampled from LR sagittal, then LR axial and LR coronal pass independently through the UASSR before being fused. In such a way, the original LR sagittal never passes through the UASSR. It seems that for lack of space the authors did not detail that part.
Finally, the paragraph “ablation study” does not make sense for me. I do not see ablation there, as fusion is not part of the method (at least I understood it as an extension). It is unclear in table 2 how the metrics are applied to “with fusion” and what “arbitrary” means in that context.
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
4
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The method presented seems to be novel and valuable. However, the experimental evaluation lacks rigor. The size and importance of the contribution depends on the statistical validity of the output data. The results on table 1 must be clearly explained. If they come from the average of the metrics applied to a population of exams, the authors can add the missing statistical analysis. However, if they come from the application of the method to one single exam, the results can be just a coincidence. As this is not clear in the paper, I put it below the line.
- Number of papers in your stack
3
- What is the ranking of this paper in your review stack?
1
- Reviewer confidence
Somewhat Confident
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
5
- [Post rebuttal] Please justify your decision
In the rebuttal, the authors addressed the most important points that caused me not to support acceptance of the submission.
Basically, it was unclear for me in the paper how the datasets mentioned were actually used (all 507 3D knee exams?, slices orientations?). Depending on the width of the analysis, the contribution could be below or above the bar. I understand now that the whole datasets were employed and arguably well addressed statistically. Then I acknowledge the differences are significant and not input dependent, which allows me to higher my score. However, clarity could be improved in the paper to avoid doubts.
The rebuttal comment “At the end of Paragraph 1, Section 3.2, we clearly presented that we conducted a two-fold cross-validation study on both datasets.” is not polite. If we, the readers, thought it was unclear, perhaps there is something the authors can make to improve clarity. The paper says: “On both datasets, we conducted a two-fold cross-validation study.”, which sounds pretty good. But in the context, the sentence is not really meaningful.
In the response about “fusion”, the authors mention section 2.2. That section is about super-resolution. Fusion is only mentioned in table 2 (“With fusion”), where one has to guess what the authors refer to, and in the last paragraph before conclusion, in the ablation study, where finally one can make some sense to connect resolution with fusion. This can certainly be improved.
Review #2
- Please describe the contribution of the paper
This paper addresses the problem of recovering high-resolution images from low-resolution images. Compared with other methods requiring the paired high and low-resolution images as input, this paper introduces an unsupervised arbitrary scale super-resolution reconstruction (UASSR) method to solve this problem and achieve good performance without pairing images between two resolutions.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Interesting idea with novel method: this paper introduces using an unsupervised arbitrary scale super-resolution reconstruction (UASSR) based on disentangled representation learning to eliminate the requirement of paired images for training.
- Good writing: this paper is well written. The whole method is clearly presented and, overall, easy to understand.
- Good results. Authors compare their methods with some existing SOTA methods and show better performances.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Some of the experiment details are not clearly shown, such as the train/val/test split for the datasets that are used in the experiment is not clearly presented. Since the inhouse dataset of spine CT scans is a new dataset, we can better assess the model better and apply other methods on this dataset if train/val/test splits are shown in the paper.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
Authors promise to release the code as well as the dataset if the paper is accepted. Thus I think reproducibility will not be a problem for this paper.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
Please check 4 and 5 for details.
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
6
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
- Novelty
- Performance
- Writing
- Number of papers in your stack
4
- What is the ranking of this paper in your review stack?
2
- Reviewer confidence
Confident but not absolutely certain
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
Not Answered
- [Post rebuttal] Please justify your decision
Not Answered
Review #4
- Please describe the contribution of the paper
This paper presented an unsupervised super-resolution methods via disentangled representation learning. The proposed method split images into content space and resolution specific space. The evaluation on MRI and CT images demonstrates the effectiveness of the proposed method.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The proposed method focused on unsupervised learning, which gets rid of paired training data.
- Splitting the images into content space and resolution specific space seems reasonable.
- Eventuating on two different modalities is a strong point.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Simply modeling resolution space as a Guassian distribution seems rather simple.
- The big picture Fig. 1 could be improved with more information.
- The comparison between the proposed and SMORE should be discussed.
- Please rate the clarity and organization of this paper
Very Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
The proposed method is complicated. Without released sourcecode, the reproduciblity of this work could be an issue.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
- The arbitrary scale is not clearly explained. If the proposed method can deal with three different scales in a single model, the resolution distribution would be Gaussian mixture model instead of a single Gaussian distribution. If the proposed method can deal one with one model, this is no difference with existing methods as existing methods can also retrain the model for different scales.
- The investigation of resolution space is not enough. How this could interpret the resolution information?
- Fig. 1 could be redraw to include more information about the training flow.
- Resolution space is independent to the content. Can resolution space be transferred from MRI to CT?
- Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making
6
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The proposed method is interesting and seems useful. Evaluation is a strong plus.
- Number of papers in your stack
8
- What is the ranking of this paper in your review stack?
1
- Reviewer confidence
Confident but not absolutely certain
- [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed
6
- [Post rebuttal] Please justify your decision
Thank the authors for their efforts in addressing my concerns.
Primary Meta-Review
- Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
The study describes a technique for increasing the resolution of a 3D medical picture (an anisotropic set of slices). The method is based on a generative adversarial network, which learns how to enhance resolution without the need for a large number of samples, as other machine learning algorithms do. Using well-known measures, the strategy is compared against various other methods in the literature. Quantitative data reveal that the provided strategy outperforms the others in the majority of comparisons. Qualitative results provide aesthetically appealing outcomes that are similar to the reality.
First, the work is indeed an interesting topic. Experimental results could provide some insights for this field. Three reviewers have positive review comments: The results presented are comparable and potentially superior to the SOTA methods for SR (R1). This paper is well written. The whole method is clearly presented and, overall, easy to understand (R2). The proposed method focused on unsupervised learning, which gets rid of paired training data. (R3).
However, reviewers also have negative comments: The quantitative assessment is input dependent. It is uncertain if the improvements will repeat with other data (R1). Some of the experiment details are not clearly shown, such as the train/val/test split for the datasets that are used in the experiment is not clearly presented (R2). Simply modeling resolution space as a Guassian distribution seems rather simple. (R3).
Reviewers are confident about their concerns. In the rebuttal, the authors may need to highlight:
- Dataset details.
- Some of the experiment details are not clearly shown.
- The proposed method is complicated. Without released sourcecode, the reproduciblity of this work could be an issue.
- What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).
8
Author Feedback
We thank meta-reviewer (MR) and all reviewers for their comments
MR,R1 Dataset details Our study was conducted on 507 3D knee MR images from public dataset OAI-ZIB [13] and 130 in-house 3D spine CT images. The dataset details are described in Section 3.1.
MR,R1,R2 Experimental details; quantitative results are input dependent At the end of Paragraph 1, Section 3.2, we clearly presented that we conducted a two-fold cross-validation study on both datasets. Results presented in Table 1 are the mean values achieved by different methods. Due to space limitation, we did not present standard deviation. For example, femoral cartilage (FC) and tibial cartilage (TC) segmentation results (Dice (%)) of 8x SR on knee dataset by ours and the second-best method (SBM) are: FC (Ours: 84.8±2.5 vs. ARSSR: 77.5±2.9) and TC (Ours: 79.9±4.6 vs. ARSSR: 68.7±6.6). Paired T-Test shows that p-values for both FC and TC segmentation of 8x SR and 4x SR images achieved by ours and the SBM are all smaller than 0.0001. Thus, the differences are statistically significant and are not input dependent. We will put all results into a supplementary document.
MR,R3 Is modeling resolution space as Gaussian distribution simple? As presented in Paragraph 3, Page 2, UASSR is an end-to-end disentangled cycle-consistent adversarial network including two auto-encoders for reconstruction and two discriminators for adversarial learning. The auto-encoders are realized by two encoders and two generators. We use encoders to disentangle each image into a domain-invariant content latent space and a domain-specific resolution latent space. The generators then ensemble latent content features and latent resolution codes to translate an image from source domain to target domain.
From above description, one can see that our work is closely related with variational auto-encoders (VAE) [1] where we force the distribution that the latent resolution variables follow to be a multivariate normal distribution. Although [1] names many other distributions for which that works, actually it does not matter so much what distribution latent variables follow since using the non-linear decoder/ generator, it can mimic arbitrarily complicated distribution of target domain (e.g., arbitrarily scaled image domain). Thus, it is not a problem to model the latent resolution space as Gaussian distribution.
[1] Kingma, D.P. and Welling, M., Auto-encoding variational Bayes. ICLR 2014.
MR,R2&3 reproducibility We promised to release our source code, implemented in Pytorch framework.
R1 The fusion part is not clear Fusion part is presented in Section 2.2. As shown in Fig. 2, we can rearrange an anisotropic 3D volume acquired along sagittal axis, which has high within-slice resolution (WSR) and low between-slice resolution (BSR), into two stacked slices, one stack along coronal axis and the other along axial axis. Slices at each stack now have high BSR but low WSR. These two stacked slices can then be super-resolved by UASSR to get two stacks of high resolution (HR) slices which have both high BSR and WSR. Each stack of HR slices can be rearranged to get a 3D HR volume. Our ablation study (Table 2) shows that averaging these two volumes leads to better results than using them alone.
R1 Limited qualitative evaluation This is caused by page limitation but it is enough to show the differences.
R#3: arbitrary scale and Gaussian distribution Our method only needs to train one network to handle arbitrary up-sampling rates, which is a clear advantage. See above for our response to your concern on modeling the resolution latent space as Gaussian distribution.
R3 Comparison with SMORE
Quantitatively (segmentation results in Table 1) and qualitatively (Fig. 3), we show that our method performs better than SMORE. As shown in Fig. 3, SMORE led to over-smoothed images. SMORE also has poor cartilage segmentation results on knee dataset with 8x setup.R3 resolution space from MRI to CT Yes, we can.
Post-rebuttal Meta-Reviews
Meta-review # 1 (Primary)
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
The authors have done a good rebuttal and one reviewer changed from weak reject to weak acceptance brought up the scores to 5/6/6 now. In general, the work is good, but as the reviewer suggested “clarity could be improved in the paper to avoid doubts.”
- After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.
Accept
- What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).
2
Meta-review #2
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
The rebuttal has addressed the data issue and clarified the experimental setup. This paper presented an unsupervised super-resolution methods via disentangled representation learning and demonstrated superior performance on a large dataset.
- After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.
Accept
- What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).
1
Meta-review #3
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
The paper presented a method to increase the resolution of a 3D medical image using a generative adversarial network that learns how to augment resolution without requiring a huge number of samples as other machine learning approaches do. The proposed method is novel and has achieved higher scores on most of the evaluation metrics. The rebuttal addressed the major concern from a reviewer who raised the final score from negative to positive after rebuttal. Since all three reviews are positive, I recommend accepting this work.
- After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.
Accept
- What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).
2