Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Zhaojie Fang, Zhanghao Chen, Pengxue Wei, Wangting Li, Shaochong Zhang, Ahmed Elazab, Gangyong Jia, Ruiquan Ge, Changmiao Wang

Abstract

Fundus photography is an essential examination for clinical and differential diagnosis of fundus diseases. Recently, Ultra-Wide-angle Fundus (UWF) techniques, UWF Fluorescein Angiography (UWF-FA) and UWF Scanning Laser Ophthalmoscopy (UWF-SLO) have been gradually put into use. However, Fluorescein Angiography (FA) and UWF-FA require injecting sodium fluorescein which may have detrimental influences. To avoid negative impacts, cross-modality medical image generation algorithms have been proposed. Nevertheless, current methods in fundus imaging could not produce high-resolution images and are unable to capture tiny vascular lesion areas. This paper proposes a novel conditional generative adversarial network (UWAT-GAN) to synthesize UWF-FA from UWF-SLO. Using multi-scale generators and a fusion module patch to better extract global and local information, our model can generate high-resolution images. Moreover, an attention transmit module is proposed to help the decoder learn effectively. Besides, a supervised approach is used to train the network using multiple new weighted losses on different scales of data. Experiments on an in-house UWF image dataset demonstrate the superiority of the UWAT-GAN over the state-of-the-art methods. The source code is available at: https://github.com/Tinysqua/UWAT-GAN.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43990-2_70

SharedIt: https://rdcu.be/dnwMu

Link to the code repository

https://github.com/Tinysqua/UWAT-GAN

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a Ultra Wide Angle Transformation GAN (UWAT-GAN), for the generation of UWF fluorescein angiography images (UWF-FA) from UWF scanning laser ophthalmoscopy (UWF-SLO) images, which is beneficial since it avoids the requirement for actual UWF-FA images (which requires injection) to be captured. A conditional GAN is thus trained with actual paired UWF-SLO and UWF-FA image data with both a global coarse generator and a fine generator, to synthesize the expected UWF-FA image from a UWF-SLO image. The results are compared against several other generative models, on four metrics.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The explicit integration of global and local scale information for GAN synthesis appears expecially appropriate
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Competing methods might not have been appropriately optimized
    • Evaluation was performed on image data from a single (hospital) source
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Code is included.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. In the Abstract, it is claimed that “current image generation methods produce low-resolution images…”. This does not appear true in general (e.g. StyleGAN and related models), and might be reconsidered.

    2. In Section 2.1, it is stated that “The results of global and local information can be used, alternately, as a reference for each other”. However, from Figure 1, data flow appears to be from the coarse (global) model to the fine (local) model only. This might be clarified.

    3. In Section 2.1, it is stated that a patch is extracted from the original image, as an input to Gen_F as described in Section 2.3. It might be clarified whether these are the different 608x768 patches as described in Section 3.1. Moreover, it might be confirmed as to how the randomly-selected patch (from Section 2.2) is it determined that the patch would have detail suitable for the generator model, instead of just being background/empty space?

    4. In Section 2.2, it is stated that a fusion block takes both patches from Gen_F and Gen_C. It might be clarified whether these patches correspond to the same region on the input image.

    5. In Section 3.1, image sharpening is described. The parameters/methodology might be included, possibly in supplementary material.

    6. In Section 3.1, the random cropping methodology might be described. In particular, are the patches possibly overlapping, and how many random patches are obtained for each image?

    7. In Section 3.1, it is stated that data augmentation was performed with random rotation. It might be clarified as to how this was done with non-square images, since an arbitrary rotation in this case would appear to create an image of different width and height.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While some details might be further included, the availability of code and technical contributions were appreciated.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #4

  • Please describe the contribution of the paper

    This paper proposes a novel conditional generative adversarial network (UWAT-GAN) to synthesize UWF-FA from UWF-SLO. In this study, UWF-FA images can be obtained without using fluorescent dye, which avoids the problem that the use of fluorescent dye will cause harm to human body.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The method described in this paper is clear, the experimental results are good, and it has great potential to be applied in clinical practice.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I think there is some discrepancy between the presentation in Figure 2 and the description in the article.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The author provided the code, but the data used in the paper is difficult to obtain, so the reproduction of the original paper is a certain challenge.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. The author designs a local information input branch. If multiple local branches are added, will the model be better?
    2. In Figure 1, it is suggested to add instructions to D_c1, D_c2 and D_F.
    3. It is recommended to change the color of the up/down-sample block in Fig. 1. The color is now the same as the fusion module.
    4. Gen_F is 3 down-samples in both Figure 1 and the description below, and Gen_C is 2 down-samples. Is the picture in Figure 2 inverted?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I think the method described in this paper is very clear, that is, the basis and function of each module design is clear. And the improvement of experimental results is also relatively high.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper proposes a new image to image convolutional architecture to synthesize UWF-SLO to UWF-FA with GANs. The contribution are

    • a double generator architecture to capture coarse and fine features
    • combination of feature mapping loss from discriminator features + VGG perceptual loss
    • an “attention transmit module” to account for the uneven level of detail between modalites.
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The results are convincing with an improvement of no reference scores. This is especially noticeable for local features.
    • it is a welcome generalization/extension of pix2pixHD that also treat generative features at multiple scales.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • N=70 with split 70%/30%, which is very low, but nevertheless demosntrate the ability of the method at performing in low data regime. However this is I think detrimental to pix2pixHD and it is thus not clear that performance differences are not mainly due to the low data regime.
    • presenting as contribution the mixing of existing losses is in my opinion deceiving
    • handling misalignment could have been integrated by a spatial transformer as in RegGan (NeurIPS2021) rather than using an ad hoc registration, making the method modality specific.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Very good with train/inference code available

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The authors propose a new GAN architecture for HD image generation in the difficult context of UWF-FA image synthesis. Results are convincing compared to state of the art and the paper is well written. I feel the methodology is somehow a bit outdated and the comparison would have surely benefited from more modern synthesis approach such as diffusion I2I models. I nevertheless recommend acceptation as it provides a clear extension to pix2pixHD and thus can be useful in many fields (provided accurate co registration of both modalities).

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    it is a convincing extension of pix2pixHD, somewhat a bit outdated in 2023 but with good results, and which could be useful to the community. the authors could have pushed their ideas a bit further to include a warper to handle misalignment rather than resorting to preprocessing, which would have made the contribution stronger.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The concept of the Ultra Wide Angle Transformation GAN (UWAT-GAN) innovative and significant in the generation of UWF fluorescein angiography images (UWF-FA) from UWF scanning laser ophthalmoscopy (UWF-SLO) images. The reviewer recognizes the advantage of this approach, as it negates the need for invasive procedures to obtain actual UWF-FA images. the reviewer suggests the paper could be improved by including comparisons with more modern synthesis approaches, such as diffusion I2I models. Despite these reservations, they recommend accepting the paper due to its clear extension to existing methodologies and potential applicability across different fields. However, they express concerns over some of the claims and methodologies in the paper. The reviewer challenges the assertion that current image generation techniques produce low-resolution images, citing the example of StyleGAN and related models which are known to produce high-resolution outputs. There are also suggestions for the authors to provide more clarity on the flow of data between the global and local models, the methodology behind the selection of image patches suitable for the generator model, the fusion of patches from Gen_F and Gen_C, the parameters for image sharpening, and the process of random cropping and rotation for non-square images. I appreciate the novelty of the UWAT-GAN and its potential impact on UWF-FA image synthesis. They agree on the need for clearer explanations and justifications for some aspects of the methodology.




Author Feedback

We thank the AC and reviewers for their insightful, valuable comments and their appreciation of our work. We summarize our responses as follows: (1) Addressing the contention that the current method only produces low-resolution images (R1); (2) Providing clearer explanations regarding the parameters for sharpening, the flow of data between the global and local models, patch selection, and the process of cropping patches and augmenting the data (R1 and R4); and (3) Other concerns raised by the reviewers. (1) We concur that StyleGAN could produce high resolution images. However, StyleGAN was trained on CelebA-HQ dataset, achieving a similar score with StarGAN-v2 model. In our work, we conducted a comparative experiment in the Sec 3.3, which demonstrated that our model could get a better score in generating UWF-FA. To the best of our knowledge, current methods in fundus imaging could not produce high-resolution images better than those obtained by our method. (2) To ensure getting the same region from Gen_F and Gen_C, we resized the images to the same size, and cropped the patches from the same position. To keep the same width and height when augmenting images, we trimmed the image corners during rotation. As the corners of non-square images typically contain less significant information, we restricted the rotation angle to less than 40 degrees. For image sharpening, we used histogram equalization to enhance image contrast. (3) Other questions raised by the reviewers: R1: Regarding the patches’ size (Sec 3.1) and the extracted patches (Sec 2.1, Q6), we missed the complete clarification in the initial submission. In Sec 3.1, it is the resized images that have size of 608×768, not the patches. This description in Sec 3.1 will be corrected. Moreover, the patches were cropped from the original or resized images and were randomly selected. As the cropped patches represent a relatively big part of the whole picture, it is almost impossible to become empty or solely represent a background. For the compared methods (Q3), we took the default parameters of the open-source codes of the competing methods, ensuring that the data volume matched the number of training cycles. Despite some models performed worse due to the abundance of data, our model proved to be superior. For overlapping and patches’ numbers (Q6), it could happen, and we could obtain 50 patches per image. R2 raised concerns about the data splitting ratio 70%/30% which is indeed low (Q3). However, we chose it since the size of dataset was relatively small (around 3000 pairs). After collecting more data in the future, we can use 80%/20%. Regarding Pix2pixHD, its performance may deteriorate when trained on smaller datasets, as demonstrated in the ablation study and by the FID and KID metrics. Concerning the loss terms (Q3), we tuned the weights of these losses to find the combination that yielded the best results. The loss contained: GAN loss, feature map loss, and VGG loss. We proposed the same loss as CGAN, but we changed the conditional loss to feature map and VGG loss for the following reasons. 1) All the layers of the discriminator, not just the last few, should be used to train the generator. Hence, we also made the middle layers discriminate an output. 2) Synthesis of UWF-FA was a complicated generation task, and some features might not be completely understood by discriminator. Therefore, VGG network was fine-tuned to help the discriminator in the beginning of training and train the generator by deeper features. Finally, we thank the R4 for pinpointing the mistake in Fig. 2 and giving advice on Fig. 1 (Q6). Indeed, Gen_F has 2 down-sampling while Gen_C has 3. We will correct it and adopt advice in our camera-ready version to show the best. For multiple local branches (Q6), we did not try it in this work. However, adding multiple branches may help to extract more informative features and improve the performance. We will emphasis this in our future work.



back to top