Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Zhihao Zhao, Junjie Yang, Shahrooz Faghihroohi, Kai Huang, Mathias Maier, Nassir Navab, M. Ali Nasseri

Abstract

AI based methods have achieved considerable performance in screening for common retinal diseases using fundus images, particularly in the detection of Diabetic Retinopathy (DR). However, these methods rely heavily on large amounts of data, which is challenging to obtain due to limited access to medical data that complies with medical data protection legislation. One of the crucial aspects to improve performance of the AI model is using data augmentation strategy on public datasets. However, standard data augmentation cannot always guarantee the clinical labels in diabetic retinopathy. This paper presents a label-preserving data augmentation method for DR detection using latent space manipulation. The proposed approach involves computing the contribution score of each latent code to the lesions in DR images, and manipulating the lesion in DR images based on the latent code with the highest contribution score. This allows for a more targeted and effective label-preserving data augmentation approach for DR detection tasks, which is especially useful given the imbalanced classes and limited available data. The experiments in our study include two tasks, DR classification and DR severity levels grading, with 4K and 2K labeled images in training sets, respectively. The results of our experiments demonstrate that our data augmentation method was able to achieve a 6\% increase in accuracy for the DR classification task, and a 4\% increase in accuracy for the DR severity levels grading task without any further optimization of the model architectures.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43898-1_28

SharedIt: https://rdcu.be/dnwA0

Link to the code repository

https://github.com/AIEyeSystem/LpDA

Link to the dataset(s)

https://www.kaggle.com/competitions/aptos2019-blindness-detection

https://www.kaggle.com/datasets/tanlikesmath/diabetic-retinopathy-resized

https://odir2019.grand-challenge.org/


Reviews

Review #1

  • Please describe the contribution of the paper

    Label-preserving Data Augmentation in Latent Space for Diabetic Retinopathy Classification research achieved by computing the contribution score of each latent code to the lesions in DR images, and manipulating the lesions in DR images based on the latent code with the highest contribution score.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This research contributed to the data augmentation accuracy for the DR classification task and the DR severity levels which involves computing the contribution score of each latent code to the lesions in DR images, and manipulating the lesions in DR images based on the latent code with the highest contribution score.
    2. Author was able to run through the method with written explanation that was enough for a person with limited scientific knowledge on diabetic research to understand the content of the research.
    3. The research paper produced quite a good result in Table 1 and 2
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. There is no mentioned on the need of using two dataset. Author did not clarify the need to use 2 datasets for this experiment.
    2. Researcher was not able to explain the results from Table 1 and 2 using the numbers in these tables, infact too much facts explanation without crunching in the numbers were written.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    All listed criteria of the reproducibility checklist have been fulfilled. Author fully comply and provided evidence.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. Author provided quite a good structure of layout of the writings.
    2. Focus of explanation was made more on the theorethical side of the experiment which is moderately good.
    3. It would be good if author used the results of Table 1 and 2 in the writing instead of numbers that does not seem to exist in the paper for example in “Results of data augmentation “ section.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. Paper has a good quality of research element in the data presented and explaination given was quite comprehensive to make it understandable.
  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    A label-preserving data augmention method that is suitable for diabetic retinopathy classification is proposed,

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Strong study design
    • Claims of the title are realized
    • High quality figures
    • Good evaluation, including ablation study
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • Seems reproducible since publicly available dataset and model are used
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • Recommend the authors compare the performance of this data augmentation method with DDPMs
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Well organized and technically sound

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper proposed a data augmentation approach that is different from the standard data augmentation approaches such as rotation, translation, noise addition, and applied this approach for the task of diabetic retinopathy (DR) classification and severity level grading task. The proposed data augmentation approach does not change the label for DR images during the augmentation process by performing a set of operations in the latent space. In the latent space, it calculates the contribution score of each latent code corresponding to the exudate lesions in DR images through the backpropagation step and then selects the latent code with the highest contribution score and manipulates the lesions in DR images corresponding to this latent code with highest contribution. This way, the authors augment the DR images without changing the label of the image.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The data augmentation strategy adopted by this paper by calculating the contribution of every latent codes of a particular type of lesion in DR images and then manipulating those lesions with the highest contributing latent code in the latent space is novel. Additionally, the incorporation of the LPIPS loss specifically for DR detection as an additional loss function to the main StyleGAN3 loss looks interesting as it trains a task-specific model. This method looks promising as it was applied to an imbalanced dataset and data augmentation was used to generate more samples of DR images while keeping their labels preserved. This data augmentation strategy improved the accuracy for both DR classification and DR severity grading task. Additionally, the data augmentation method looks general that could be applied to other imbalanced medical image segmentation tasks by generating domain specific masks.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The paper says that exudates are commonly observed in the early stages of DR. But it calculates the highest contributing latent code of exudate lesions only and manipulates those lesions to generate the augmented images. This would augment DR images with exudate lesions and the early stage (stage 2) DR severity levels could be accurately classified (as shown in Table 2) since exudates are mainly present in early stages of DR. However, for stage 3 or stage 4 DR severity level, there are many other contributing factors other than exudates. Additionally, only a small number of stage 3 and stage 4 DR images are present for training as shown in supplemental material. Hence, the data should be augmented based on those other factors more specific to stage 3 or stage 4. Otherwise, the improvement for stage 3 or 4 DR level shown in Table 2 is not justified as it looks like the performance gain is due to data augmentation based on exudates which are typically not seen or hard to detect in these late stages.
    • The results in Table 2 show that the proposed augmentation method achieves better performance than other augmentations for stage 2,3,4 of DR. One of the contributing factors is manipulating exudate lesions. However, exudate lesions are only present in the earlier stages of DR, then how are the exudates manipulated for the later stages, say stage 4 if exudates are not present in this stage as stated in the introduction of the paper?
    • In Table 2, the accuracy for stage 1 is worse than when applying no augmentation. The accuracy with no augmentation is expected to be similar as stage 1, as stage 1 also does not perform augmentation due to the absence of exudates. This looks unclear. -In section 2.2, for part B, equation 2 needs a little more explanation – how is the binary segmentation mask equal to the gradient of f. If the image (x) is the output of D, then f should be the function that converts x to masks and then sums to give y. Also, what is the pre-trained exudates segmentation network? Is it a U-Net trained specifically for exudates using ground-truth exudates mask – then how are these masks obtained especially for stage 3 and 4 when other factors are present? -In Equation 4, the contribution score R is calculated as the derivative of function f w.r.t. latent space S but in Figure 1, it is shown that R is calculated as the derivative of y w.r.t. x. This is confusing. Which is correct – equation 4 or Figure 1? Also, function f is not indicated in Figure 1.
  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • This paper can be easily replicated. Additionally, the authors will provide the code once the paper is accepted.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Please refer to the weaknesses section for additional comments. In addition to those, I have the following questions and comments:

    • Please expand LPIPS loss when it is used for the first time in the Introduction section.
    • A brief introduction to the components of StyleGAN3 should be provided.
    • In Methods section, first paragraph, In part B, it should be trained generator D in part A and not decoder D.
    • For equation 1, indicate the terms of the loss function. Does the L_gan include the min-max generator-discriminator loss? Please explain.
    • For the LPIPS loss, do you use the last 5 feature maps from the pre-trained VGG16 (trained on DR detection)? If so what is the motivation behind using the last 5 feature maps and how does it help if you select more number of feature maps?
    • Are part A and part B in the proposed approach performed in a two-stage fashion, i.e. first part A is trained and then part B is trained or part B just uses the pretrained D from part A and the pre-trained exudates segmentation network? Please explain.
    • In part C (section 2.3), the styleGAN3 generator is D (as Figure 1 says) and not G(.). Please be consistent with notations. Also, please make the changes in Figure 1 part C according to equation 6.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The weaknesses in this paper outweigh its strengths. Although this paper has some novelty, it is not clearly written and organized, with several confusing notations in the methods section as highlighted in weaknesses. It also has a lot of grammatical mistakes. Some of the results are questionable (refer to weaknesses section), specifically since this paper performs data augmentation based on exudate lesions which are observed in the early stages of DR, but these exudate lesions are used to augment the late stage DR images also.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper under review proposes a method for label-preserving data augmentation in latent space for diabetic retinopathy (DR) classification. This method involves computing the contribution score of each latent code to the lesions in DR images and manipulating these lesions based on the latent code with the highest contribution score.

    Strengths:

    The research contributes to the accuracy of data augmentation for the DR classification task and the severity levels of DR. The author provides a clear explanation of the method that is understandable even for those with limited scientific knowledge in diabetic research. The results presented in Tables 1 and 2 were considered good. Weaknesses and suggestions for improvements:

    The need for using two datasets in the experiment is not clarified. The explanation of results from Tables 1 and 2 is lacking. The paper provides extensive factual explanations but lacks numerical analysis. The paper focuses on exudates, which are common in early stages of DR, but may not be the most relevant in later stages (3 and 4). Reviewers suggest augmenting the data based on other factors more specific to these stages. It is unclear how exudate lesions, mainly present in earlier stages, are manipulated for later stages where they are typically not present. The accuracy for stage 1 DR with the proposed augmentation is worse than when no augmentation is applied, which is unclear and needs clarification. Equation 2, which describes how the binary segmentation mask is equal to the gradient of f, needs more explanation. The paper needs to clarify what the pre-trained exudate segmentation network is and how the masks are obtained, especially for stages 3 and 4 where other factors are present. There is a discrepancy between Equation 4 and Figure 1 in the calculation of the contribution score R. This needs to be clarified and corrected. In summary, while the paper presents a promising approach for data augmentation in DR classification, there are several areas that the reviewers suggest for improvement, particularly in the explanation of results and technical details.




Author Feedback

We really appreciate the reviewers for your constructive feedback. #1-Q3: For the results that appear in the tables, we will pay more attention to explaining the reasons for the numerical results in the tables in the final version of the manuscript.

#2-Q: DDPMs does perform very well in image generation tasks recently, but if we want to implement a label-preserving generation task, then we need to generate the corresponding data for each label separately.

#3-Q1: The main concern of Review 3 pertains to stages 3 and 4 of diabetic retinopathy (DR). The evaluation criteria for DR severity include not only the presence of hard exudates but also other factors such as the appearance of neovascularization. This issue is crucial, and we also focused on it during our experiment. In our study, the experiments were designed to augment only one factor, namely hard exudates, because although stages 3 and 4 involve more factors than just the presence of hard exudates, we examined the data sets for both stages, and hard exudates were present in many images. Our experimental results demonstrate that data augmentation is still effective even with only one factor. Furthermore, our method is general, requiring only the mask of the pathology to identify the corresponding latent code. Although our study only augment on hard exudates, our method is capable of augmenting different pathologies separately without the need to train the model for each pathology separately. Q2: (how are the exudates manipulated for the later stages, say stage 4 if exudates are not present in this stage as stated in the introduction of the paper?) If there is no presence of hard exudates in a particular image, the segmentation network will not detect the hard exudates mask, and thus this image will not be edited. But we check the dataset, stages 3 and 4 still have a considerable portion of the data includes hard exudates as a significant factor. Q3: (In Table 2, the accuracy for stage 1 is worse than when applying no augmentation.) We conjecture that this outcome may be a result of data augmentation for the other stages, which led to a lower proportion of stage 1 data than previously. Q4: (how is the binary segmentation mask equal to the gradient of f.) In Section2.2 we define the gradient map as the contribution matrix. Because the binary segmentation mask only values 1 and 0. the value 1 can give the meaning of contribution, 0 means no contribution. So we can regard binary semantic segmentation mask as a contribution matrix. Due to these prerequisites, the binary semantic mask has same meaning with gradient map. (what is the pre-trained exudates segmentation network?) We use U-net as exudates segmentation network. We will make it more clear in the final version of the manuscript. Q5: (y,f,S,x, Which is correct – equation 4 or Figure 1?) Sorry for the confusion, we will fix and standardize these symbols in the final version. In the paper, “f” means the function to compute the importance score for “x”. “y” is importance score. Here “f,x,y” are some general symbols used to define functions. “S” is the a specific symbol, “S==x” in our paper.



back to top