Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Robbie Holland, Oliver Leingang, Christopher Holmes, Philipp Anders, Rebecca Kaye, Sophie Riedl, Johannes C. Paetzold, Ivan Ezhov, Hrvoje Bogunović, Ursula Schmidt-Erfurth, Hendrik P. N. Scholl, Sobha Sivaprasad, Andrew J. Lotery, Daniel Rueckert, Martin J. Menten

Abstract

Age-related macular degeneration (AMD) is the leading cause of blindness in the elderly. Current grading systems based on imaging biomarkers only coarsely group disease stages into broad categories that lack prognostic value for future disease progression. It is widely believed that this is due to their focus on a single point in time, disregarding the dynamic nature of the disease. In this work, we present the first method to automatically propose biomarkers that capture temporal dynamics of disease progression. Our method represents patient time series as trajectories in a latent feature space built with contrastive learning. Then, individual trajectories are partitioned into atomic sub-sequences that encode transitions between disease states. These are clustered using a newly introduced distance metric. In quantitative experiments we found our method yields temporal biomarkers that are predictive of conversion to late AMD. Furthermore, these clusters were highly interpretable to ophthalmologists who confirmed that many of the clusters represent dynamics that have previously been linked to the progression of AMD, even though they are currently not included in any clinical grading system.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43990-2_68

SharedIt: https://rdcu.be/dnwMs

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    A novel method using feature-space trajectory for unsupervised biomarker discovery for AMD.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is well-written and pretty readable.
    • The idea of using feature point trajectory to indicate the disease progression sounds novel and interesting, although the adopted algorithm for trajectory clustering, TRACLUS, is somehow outdated (proposed in 2007)
    • The motivation is clear, and the summary of previous work is quite good.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Some details of the methodology are missing. For example, what’s the loss function of contrastive learning (sec 3.2)? How to use the trajectory clusters to predict the conversion to late AMD as a new biomarker (sec 4.2)?
    • Lack of necessary experiments: The authors proposed three distance functions but only showed the final one’s results. A quantitative baseline comparison should be provided. Also, a comparison with other possible methods should be considered. For example, the most straightforward one which trains a classification DNN using the OCT images (for demonstrating the performance gap of trajectory clustering to supervised learning algorithms), and an arbitrary OCT-based unsupervised clustering algorithm as the unsupervised baseline.
    • Lack of quantitative results. For sec 4.1, there are only qualitative results. The authors should select appropriate metrics for numerical evaluation.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    good

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    please find the comments above.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    A novel idea of using feature trajectory as an AMD progression biomarker. The experiments should be enriched, and the method details should be provided.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    most of my concerns have been clarified. I’d like to raise the score to 6.



Review #2

  • Please describe the contribution of the paper

    The authors proposed a technique to cluster images based on encoding of disease progression to facilitate discovery of temporal biomarkers not yet incorporated in the current grading system.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength is the clustering of sub-trajectories of longitudinal data.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Lake of rigorious evaluation for mapping clusters to temporal biomarkers by specialists.
    2. The proposed model does not “automatically” discover biomarkers, but features that facilitate clustering of images, and the interpretation of the biomarkers are done by experts.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The reproducibility of the work is low because the both the code and datasets will not be released. In addition, the qualitative assessment cannot be repeated.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. The contributions should be summarized at the end of introduction, which should not be a summary of the entire technique. The authors should also pay closer attention to proper indentation of paragraphs (MICCAI template) to improve the readability.
    2. The mention of phi and lambda in Figure 2 can be very confusing. They are not just binary values. However, it is important to point out that one similarity measure leads to another, not 3 independent measures.
    3. Assuming the technique for biomarker discovery being the main contribution of this paper, the evaluation for mapping clusters to temporal biomarkers by specialists should he more rigorous, because similar or better outcomes might be easily achieved by other deep learning models using OCT images directly.
    4. When presenting the clustering outcome to the specialists for summarizing consistent temporal biomarkers, how is a cluster presented? A set of images? How many experts are involved in making the interpretation and summary of the clusters? Any inter-subject variability?
    5. What is the rationale that support the choice of K? Is K clinically determined? Does each cluster exhibit a unique set of temporal biomarkers?
    6. Based on my understanding, the outcome of the system is a set of clusters, not temporal features/biomarkers. Are the biomarkers determined by the experts based on examination of the clustered data? If yes, the identification of potential biomarkers is manual and subjective, not automatic and consistent, but the authors claim the process to be automatic in the abstract “…we present the first method to automatically discover biomarkers…”. Features learned by a ML model don’t always equate to biomarkers.
    7. The authors should be more careful about making the claim “Current grading systems … are unable to predict future disease progression.” There exist DL systems making prognosis for AMD [Schmidt-Erfurth,2018]. [Schmidt-Erfurth,2018] Schmidt-Erfurth et al. Artificial intelligence in retina. Progress in Retinal and Eye Research (2018)
    8. For the quantitative analysis in section 3.5, what is the reason for repeating the entire process (the end of the 2nd paragraph)? The two datasets are not explicitly mentioned. What is being used in the current grading system (the static biomarkers) for comparison?
    9. How can features be extracted from the clusters for making prediction using a linear regression model? Are the features the membership to K clusters?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I recognize the novelty in clustering images by encoding the progression using longitudinal data, but the study is considered premature with insufficient evaluation. The claim of being automatic biomarker discovery is also an over-statement.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The authors addressed my main concern, among many others, about mapping of clusters to potential temporal biomarkers. Such information should be included in the manuscript for the proposed method to be clinically applicable. The expert agreement on the mapping of the clusters to clinically known biomarkers is only 60%. It is important that they publish such information for the readers to decide the credibility of the work.



Review #3

  • Please describe the contribution of the paper

    This paper describes an approach to discover novel, AI-motivated features of interest for detection of AMD. The features are arrived at by using contrastive loss in feature space over time; the trajectories in feature space are clustered and analyzed by clinicians and compared to existing ‘static’ biomarkers in order to discover novel biomarkers suggested by this ‘self-supervised’ AI approach.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper uses state-of-art techniques (such as contrastive learning) as well as clear visuals to describe the temporal as well as feature-based approach to learn similar (close-distance) elements in a given class and varying (long-distance) elements from opposing classes. The method used to learn trajectory paths over time is explained rigorously via transition equations. The qualitative and quantitative impact of novel features found is described.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper does not exactly explain whether differences in clustered regions found by the AI and regions of importance found by experts differ and if so if those differences might shed light on potential new biomarkers. The finding of new biomarkers for determining AMD progression are mentioned, but more details on what those are clinically/what they are called (i.e., new biomarker names) would help the reader appreciate the findings.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Reproducibility is addressed.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The explainability of this technique and potential to shed light on new clinical features is its greatest strength; the use of contrastive learning features over time and space is a novel use of SOTA techniques. The meaning of the findings (specifically names of novel biomarkers found) could be further emphasized to improve the paper’s quality and enhance the reader’s understanding of results.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper uses state-of-art techniques (such as contrastive learning) as well as clear visuals to describe the temporal as well as feature-based approach to learn similar (close-distance) elements in a given class and varying (long-distance) elements from opposing classes. Applying this technique over both time and space is novel and seems to suggest novel features for AMD progression. More elaboration on the identity of those novel features in terms of name and image position side-by-side would help to strengthen the overall reader’s understanding of the paper.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    7

  • [Post rebuttal] Please justify your decision

    The authors have addressed my main concern by naming the discovered biomarkers as follows: rapid growth of drusen PED, regression of drusen PED, development of subretinal fluid and stable disease state (no progression). They have now explicitly reported this naming scheme in the discussion of their revised paper, so my accept decision remains confirmed.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper proposed a noval techinque to cluster images based on encoding of disease progression to facilitate discovery of temporal biomarkers not yet incorporated in the current AMD grading system. Reviewers appreciate the innovation of the work in this paper, but there are still many problems that must be solved. For example, both reviewers 1 and 2 raised questions about the details of the method, experimental design and performance presentation. In particular, reviewer 2 questioned that the proposed method is not sufficiently evaluated. I suggest that the authors provide a more comprehensive rebuttal by addressing the reviewer’s questions in greater detail.




Author Feedback

We thank the reviewers for their time and high-quality reviews. We appreciate that all see merit in our novel method that proposes temporal biomarkers of AMD by clustering disease trajectories in time series of OCT images. In this revision we have addressed their remaining concerns:

  1. Methodological details (R1, R2) We now add detail on the BYOL contrastive loss that is used during pretraining (R1). Next, we highlight that phi and lambda are real numbers (R2). These parameters control the weighting of the three sub-trajectory distance measures. We refer R1 to supplemental Figure 2 for a “quantitative baseline comparison”. Although these three measures are not completely orthogonal, we emphasise that DTW is a fundamentally different encoding of distance between time series than Euclidean distance [16] (R2).

  2. Evaluation details (R1, R2) R1 and R2 have requested more information on how we use our temporal clusters to predict late AMD. Each sub-trajectory is characterised by a vector of size K=30 that encodes proportional similarity to each cluster. This vector is then used by the Lasso linear regression model. By “repeating the entire process” five times we can measure the variance in performance (R2). Furthermore, we have increased the visibility of the two used datasets and baseline clinical grading system by restructuring section 3.1. Regarding mapping of clusters to biomarkers, we randomly drew ten patients from each cluster and presented one sub-trajectory from each to four ophthalmologists (R2). There was consensus between experts in 18 clusters while there was inter-rater disagreement in the remaining 12. The total number of clusters K=30 was empirically determined to balance cluster quality (homogeneity) and ophthalmologists’ resources. We have added these missing details to our evaluation and results sections.

  3. Additional baselines (R1, R2) Our work tackles the problem of proposing biomarkers for a new and better AMD grading system. As such, the most relevant baseline is the current grading system. Still, for reference we now provide a ResNet50 baseline to “demonstrate the performance gap” between our interpretable approach and supervised learning algorithms (achieving 0.71, 0.74, 0.60 and 0.20 MAE for predicting Late AMD, CNV, cRORA and VALogMAR, respectively). As suggested by R1, we have additionally computed an unsupervised clustering baseline that uses k-means (K=30) to group single-time-point images. This performed worse (0.77, 0.82, 0.70 and 0.26 MAE) than our method based on temporal clusters.

  4. Results presentation (R1) R1 asks that we include quantitative results in section 4.1. We would like to point out that our work already contains a dedicated section and results table (section 4.2, Table 1) reporting numerical evaluation, which we have further extended and emphasised in our revision.

  5. Clinical impact (R3) Each of our candidates for temporal biomarkers were given a name by the four ophthalmologists. For example rapid growth of drusen PED, regression of drusen PED, development of subretinal fluid and stable disease state (no progression). We now explicitly report this naming scheme in our discussion. Given the page limit we aim to explore the clinical role of these biomarkers in a follow-up study.

  6. Claims (R2) We agree with R2 that our clusters are not biomarkers before they are interpreted by ophthalmologists. While this distinction is already introduced in our introduction, we now provide a more nuanced description by exchanging the term ‘biomarker discovery’ for ‘biomarker proposal’ in the manuscript. We also adjust our wording that current grading systems ‘are unable to predict future disease progression’ to ‘lack prognostic value’. However, this problem is not solved by deep models which do not meet the interpretability requirements of clinical grading systems. In stark contrast, our method offers interpretable biomarker proposal that can expedite the existing biomarker discovery process.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Pros:

    • The idea of using feature point trajectory to indicate the disease progression is novel and interesting.
    • The paper is well written and easy to follow. Cons:
    • Clarity: some details of the methodology are missing.
    • Lack of necessary experiments.
    • Lack of quantitative results. After Rebuttal:
    • reviews are more consistant and positive;
    • major concern from a low-scored reviewer was addressed, and two reviewers gave higer scores
    • major issues are well explained and acknowledged



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper puts forth an innovative approach to detect Age-Related Macular Degeneration (AMD) by discovering distinct, AI-informed features. Utilizing contrastive loss in feature space over time, the authors derive these features which are subsequently analyzed by clinicians and juxtaposed with existing static biomarkers. The primary objective of this self-supervised AI strategy is to propose new biomarkers.

    The paper is imbued with numerous strengths, which include the application of state-of-the-art techniques such as contrastive learning, accompanied by clear visual illustrations. Furthermore, the authors thoroughly elucidate the methodology employed to learn trajectory paths over time using transition equations. Of significant note is the articulation of the impact of the newly discovered features, both in qualitative and quantitative terms.

    Nevertheless, the paper falls short in adequately explicating whether the disparities between the AI-identified clustered regions and the expert-identified significant regions could possibly uncover new biomarkers. While the authors allude to the discovery of novel biomarkers for AMD progression, a comprehensive description of these, especially their clinical nomenclature, would greatly improve the reader’s understanding of the findings.

    Addressing the reviewers’ comments, the authors provide comprehensive clarifications on various issues:

    Methodological details: They supplement information on the BYOL contrastive loss used during pretraining and provide clarity on parameters phi and lambda. Moreover, they guide reviewers to supplemental Figure 2 for a quantitative baseline comparison and underscore the uniqueness of the distance measures used.

    Evaluation details: The authors shed light on their strategy for predicting late AMD utilizing temporal clusters and explain the methodology for measuring variance in performance. They further illuminate their mapping of clusters to biomarkers, detailing the consensus attained among experts.

    Additional baselines: The authors juxtapose their work with the current AMD grading system and introduce a ResNet50 baseline. They also include an unsupervised clustering baseline that employs k-means for single-time-point images.

    Results presentation: They affirm that their work already houses a section documenting numerical evaluation, which has been enhanced and underscored in their revision.

    Clinical impact: The authors disclose that the proposed temporal biomarkers have been christened by the four ophthalmologists. They unveil these names and signify a more in-depth exploration of the clinical role of these biomarkers in future research.

    Claims: The authors concur with the reviewers on the prerequisite for ophthalmologist interpretation of their clusters before they can be acknowledged as biomarkers. They also fine-tune their description of the current grading system’s limitations and reassert the interpretability of their method and its potential to hasten biomarker discovery.

    Considering the authors’ thoughtful clarifications and their commitment to address all concerns in their final manuscript, I am supportive of accepting this paper. The innovative approach of the authors promises potential in the discovery of novel biomarkers for AMD, which could contribute substantially to the field.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper presents a novel approach to unsupervised biomarker discovery for AMD using feature space trajectories. However, R2 considered that the paper lacked a rigorous expert assessment on the mapping of clusters to temporal biomarkers and also raised some detailed description issues. In the rebuttal, the authors state that they agree with R2’s view. They added these missing details to the assessment and results sections, modified some of the detailed descriptions and addressed R2’s concerns. Finally, R2 decided to upgrade the score and agreed to accept the paper. Given the common score of several reviewers and their affirmation that they addressed the issues raised by several reviewers, I support the acceptance of this paper.



back to top