Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Khondker Fariha Hossain, Sharif Amit Kamran, Joshua Ong, Andrew G. Lee, Alireza Tavakkoli

Abstract

The rapid accessibility of portable and affordable retinal imaging devices has made early differential diagnosis easier. For example, color funduscopy imaging is readily available in remote villages, which can help to identify diseases like Age-related Macular Edema (AMD), Glaucoma, or Pathological Myopia (PM). On the other hand, astronauts at the International Space Station utilize this camera for identifying Space-associated neuro-ocular Syndrome (SANS). However, due to the unavailability of experts in these locations, the data has to be transferred to an urban healthcare facility (AMD and Glaucoma) or a terrestrial station (SANS) for more precise disease identification. Moreover, due to low bandwidth limits, the imaging data has to be compressed for transfer between these two places. Different super-resolution algorithms have been proposed throughout the years to address this. Furthermore, with the advent of deep learning, the field has advanced so much that 2x to 4x compressed images can be decompressed to their original form without losing spatial information. In this paper, we introduce a novel model called Swin-FSR that utilizes Swin Transformer with spatial and depth-wise attention for Fundus Image super-resolution. Our architecture achieves Peak signal-to-noise-ratio (PSNR) on three public datasets. Additionally, we tested the model’s effectiveness on a privately held dataset for SANS provided by NASA and achieved comparable results against previous architectures.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43990-2_65

SharedIt: https://rdcu.be/dnwMp

Link to the code repository

https://github.com/FarihaHossain/SwinFSR

Link to the dataset(s)

https://amd.grand-challenge.org/Home/

https://palm.grand-challenge.org/


Reviews

Review #2

  • Please describe the contribution of the paper

    This paper describes a Swin Transformer-based model for the superresolution of retinal fundus images, towards the application of transferring these images for the identification of Space-associated neuro-ocular Syndrome (SANS), under conditions of low bandwidth. The proposed SwinFSR was compared against various other recent models for superresolution of three public fundus datasets, with the PSNR and SSIM metrics used to assess decompression performance under five-fold cross-validation. SwinFSR was shown to outperform all other methods, on all datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Detailed assessment of novel Swin-based super-resolution model, including ablation experiments on the effect of iRSTB/DCA block quantity
    • Application aimed at the less-explored SANS condition
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Evaluation on restoration quality metrics alone might not be the most appropriate, given the proposed motivation
    • Tradeoff between the computational cost of compression models versus transmission bandwidth might be further justified, given the proposed motivation
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The code is to be publicly released.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. A major concern about the study as presented, is that it does not directly evaluate the impact of image super-resolution on compressed/downsampled images, towards disease classification performance (whether for SANS or AMD etc.). Instead, only reconstruction metrics (PSNR/SSIM) are presented.

    Due to this, it is difficult to assess the degree to which the improved super-resolution actually improves disease classification (or not), which is the stated motivation; as such, for the study to attain its true potential, it would be strongly recommended to perform the analysis with classification metrics (e.g. sensitivity, specificity, accuracy, AUC, F1 score etc.) too.

    1. In Section 3.1, it is stated that the datasets contain images “with a high resolution”, which are then downsampled to 512x512. It might be clarified as to whether these “high resolutions” are relevant to the SANS application (i.e. if the actual camera used would achieve these resolutions)

    2. Moreover, the input and output resolutions for the super-resolution models might be explicitly stated. In particular, was 512x512 the input resolution (so the x2 and x4 outputs would be 1024/2048 respectively), or was it the target resolution (so the inputs for x2 and x4 would be 256/128 respectively)?

    3. Related to the above, if classification performance analysis is attempted, it might be considered to also present the performance achievable with downsampled inputs (without any super-resolution attempted), to also justify the need for super-resolution for the task in the first place.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The application is interesting and the technical contribution is good, but the choice of metrics may be suboptimal and not provide sufficient justification as to whether the contribution actually aids the proposed task.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    We thank the authors for clarifying the main purpose of the study, and wish to affirm that the classification performance referred to in our previous comments are not necessarily by automated classification, and raised because diagnosis is after all the final application of the process. In any case, the clinical diagnosis performance might be also included in the manuscript.

    For Section 2.4, “Loss Funcion” might be “Loss Function” instead.



Review #3

  • Please describe the contribution of the paper

    The contribution of this paper is a novel super-resolution architecture called Swin-FSR that super-resolves fundus images to up to 4x their original input size to enable the compression and decompression of fundus images especially as needed for the International Space Station to detect diseases such as Space-associated Neuro-ocular Syndrome (SANS). Such technology could enable more effective transfer to an urban healthcare facility or a terrestrial station for evaluation. Swin-FSR achieves PSNR and SSIM better than existing state-of-art approaches on multiple publicly available fundus image datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The strength of the paper is in the methodology of implementation of the Swin-FSR model. The model incorporates low-frequency feature extraction, deep feature extraction, and high-quality image reconstruction modules. The low-frequency feature extraction module employs a convolution layer to extract low-level features and is then directly passed to the reconstruction module to preserve low-frequency information. The paper’s novelty is introduced in the deep feature extraction through which they incorporate a depth-wise channel attention Block (DCA), an improved Residual Swin-transformer Block (iRSTB), and aSpatial and Channel Attention Block (SCA).

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Equations (1) through (6) which explain the different components of the Swin-FSR and their relation to one another are helpful in their intent to put each component into context, but they are a bit difficult to follow. It might help to include another figure pointing out the components and relative roles of equations (1) though (6) in the bigger picture of Swin-FSR; perhaps these equations could even be added just within each box of existing Figure 1 to help provide context for each submodule.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    All aspects of reproducibility have been addressed.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The main constructive suggestion I would provide to authors is to consider including another figure pointing out the components and relative roles of of equations (1) though (6) in the bigger picture of the Swin-FSR implementation. Alternatively, perhaps these equations could even be added in each box within Figure 1 to help provide context. Also, the SANS disease application for this model is mentioned once briefly in abstract and introduction and again in conclusions, but no other details are provided in the rest of the bode of the paper; adding more details on the use-case of Swin-FSR would help to motivate its varying sub-models. Quantitative comparisons in Table 1 show SSIM and PSNR are higher across the board for Swin-FSR, but how significant are these increases? Some statistical analysis would be helpful.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper provides an interesting approach to super-resolving fundus images in order to enable data to be viewed from space via compression and decompression at terrestrial stations and urban healthcare centers for diseases such as Space-associated Neuro-ocular Syndrome (SANS). The paper’s methodology (equations (1) through (6)) could be made clearer, and the practical pipeline used during compression/decompression of images would be useful to know to understand the relative value of Swing-FSR. While performance seems to be better when using this method compared to other methods according to Table I, addition of statistical analysis to show significant differences would help to strengthen the paper.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    The authors have partially addressed my concern regarding providing more explanation for equations 1-6. However, they have not to my knowledge addressed the second part of my suggestion to provide results of statistical significance tests (Mann Whitney U, etc.) for the final performance values they’ve arrived at to be able to make a clear statement of statistical significance of their results’ improvement compared to past results, so my decision remains the same.



Review #4

  • Please describe the contribution of the paper

    The authors proposed an end-to-end architecture called Swin-FSR, which is a Swin Transformer-based model with spatial and depth-wise attention for super-resolution of fundus images in telemedicine. The experimental results demonstrate that the proposed Swin-FSR has achieved a satisfactory visual effects of reconstruction and the best performance on three public datasets and one private dataset in terms of PSNR and SSIM.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors applied the most popular deep learning techniques such as Swin Transformer and attention mechanism in super-resolution of fundus images, which is of great value to telemedicine where compressed images for transfer are required.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    DCA and SCA blocks adopted in the proposed Swin-FSR are a little limited in technical novelty. In addition, the work of fundus image super-resolution has not been verified in the subsequent identification of some diseases such as age-related macular edema, glaucoma, pathological myopia and space-associated neuro-ocular syndrome, which limited the demonstration of clinical feasibility.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Most of details of the method have been clarified clearly and the code will be publicly released, which brings the fine reproducibility of the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. The full name of AMD is Age-related Macular Degeneration rather than Age-related Macular Edema.
    2. The last sentence of the first paragraph in the introduction should be modified as: Compressing and decompressing the data without losing spatial information can be utilized in this scenario by the super-resolution algorithm.
    3. There exists a mistake in Eq. 2. Both inputs of two parallel branches H_DCA and H_iRSTB are F_LF.
    4. It seems lack of shortcut channel-wise product in the DCA block.
    5. The authors mentioned that they used Bicubic Interpolation to resize the images into (512 × 512). What’s the original size of these images? Is Bicubic Interpolation aimed at synthesizing the compressed images with low resolution for paired training? If so, is the trained model capable to improve the resolution of those real compressed images?
    6. MSA in Fig. 1 should be modified as SW-MSA; Both Fig. 2 and Fig. 3 lack the low-resolution images, and Table 1 also lacks PSNR and SSIM of the low-resolution images.
    7. The adopted iRSTB consists of SCA and STLc blocks. However, the authors have not conducted ablation studies on both blocks in Supplementary Table 1.
    8. The proposed Swin-FSR has not been applied in the subsequent identification of some diseases for further assessment of its effectiveness.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    There is a few mistakes and limitations of technical novelty in the proposed method, and the effectiveness of this work in clinical applications has not been evaluated.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The reviews of this work are quite divergent. The authors are suggested to provide a rebuttal to clarify main issues raised by reviewers. The main issues raised by reviewers are:

    Clinical Motivation:

    • Evaluation on restoration quality metrics alone might not be the most appropriate, given the proposed motivation.
    • The work of fundus image super-resolution has not been verified in the subsequent identification of some diseases.

    Technical novelty:

    • DCA and SCA blocks adopted in the proposed Swin-FSR are a little limited in technical novelty.

    Method details:

    • Tradeoff between the computational cost of compression models versus transmission bandwidth might be further justified.
    • More details of SANS disease application should be provided.

    Insufficient evaluation:

    • Evaluation on restoration quality metrics alone might not be the most appropriate.
    • Quantitative comparisons in Table 1 show SSIM and PSNR are higher across the board for Swin-FSR, but how significant are these increases? Some statistical analysis would be helpful.




Author Feedback

We thank the reviewers for the suggestions and constructive comments on the paper. We have addressed the comments below in detail, which will be reflected in the camera-ready version of the paper. We hope the reviewers will be content with our response and re-evaluate our score.

Response-Reviewer 2:

  • Comment 9.1 and Comment 9.4: The purpose of this study is not to show that Super-resolution is improving the classification performance of automated deep learning diagnosis; rather, they are meant to preserve anatomical features in high-quality images for clinicians to diagnose. So, SR utilizes to recover disease pathologies from downsampled images for terrestrial and extraterrestrial scenarios, which is described in detail in Section 1, Page 2, lines 1-11. We also highlight the limitations of existing super-resolution techniques in Section 1, Page 2, lines 12-22, and Page 3, lines 1-2

Although further identification of disease using clinical experts was not part of our current study due to page limitation and the scope of this submission, we still carried out a diagnostic assessment with two expert ophthalmologists and test samples of 80 fundus images (20 fundus images per disease classes: AMD, Glaucoma, Pathological Myopia and SANS for both original x2 and x4 images, and super-resolution enhanced images). Half of the 20 fundus images were control patients without disease pathologies; the other half contained disease pathologies. The clinical experts were not provided any prior pathology information regarding the images. And each of the experts was given 10 images with equally distributed control and diseased images for each disease category.

The accuracy and F1-score for original x4 images are as follows, 70.0% and 82.3% (AMD), 75% and 85.7% (Glaucoma), 60.0% and 74.9% (Palm), and 55% and 70.9% (SANS). The accuracy and F1-score for original x2 are as follows, 80.0% and 88.8% (AMD), 80% and 88.8% (Glaucoma), 70.0% and 82.1% (Palm), and 65% and 77.4% (SANS)

The accuracy and F1-score for our model Swin-FSR’s output from x4 images are as follows, 90.0% and 93.3% (AMD), 90.0% and 93.7% (Glaucoma), 75.0% and 82.7% (Palm), and 75% and 81.4% (SANS). The accuracy and F1-score for Swin-FSR’s output from x2 images are as follows, 90.0% and 93.3% (AMD), 90.0% and 93.7% (Glaucoma), 80.0% and 85.7% (Palm), and 80% and 85.7% (SANS)

We also tested SWIN-IR, ELAN, and RCAN models for diagnostic assessment, out of which SWIN-IR upsampled images got the best results. For x4 images, the model’s accuracy and F-1 score are 80% and 87.5% (AMD), 85.0% and 90.3% (Glaucoma), 70.0% and 80.0% (Palm), and 70% and 76.9% (SANS). For x2 images, the model’s accuracy and F-1 score are 80% and 87.5% (AMD), 80% and 88.8% (Glaucoma), 70.0% and 80.0% (Palm), and 75% and 81.4% (SANS).

Based on the above observations, our model-generated images achieves the best result.

  • Comment 9.3: The target resolution as 512x512, and the input x2 =256, x4=128.

Response-Reviewer 3:

Comment 9: We thank the reviewer for the constructive comment. Due to the page limitation, we could not add another figure with details. For the camera-ready version, we will try to compact the writing more focusing on the use case of the SANS diseases application. Moreover, to address the model’s clinical significance, we provided a diagnostic study which is provided in response to reviewer 2’s comment 9.1

Response-Reviewer 4:

Comments 9.1, 9.2, 9.3, 9.6, 9.7: We thank the reviewer and will update it in the camera-ready Comment 9.4: We experimented with the shortcut but didn’t see any improvement. Comment 9.5: The original images were in various sizes. We used Bicubic Interpolation for batch-wise training to resize the images into 512*512 and the trained model can improve the resolution of compressed images with different upscaling factors.
Comment 9.8: We conducted a diagnostic assessment with clinical experts, which is provided in response to reviewer 2’s comment 9.1




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors have partially addressed the reviewers’ concerns. It is suggestion to provide results of statistical significance tests. The paper is generally well written with the methods being easy to understand and follow. It reaches the minimum requirement for publication.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    According to the reviewers’ comments and the rebuttal status, the authors mainly responded to the lack of clinical assessment indicators. However, R4 considers that the novelty of the proposed method is limited. The paper proposes two new modules : iRSTB and DCA, and their contribution falls short compared to other papers. I agree with R4 and recommend rejecting the paper.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper describes a Swin Transformer-based model for the superresolution of retinal fundus images. Authors paritially addressed the concerns arised by reivewers. I recommend accept.



back to top