Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

An Zhao, Ahmed H. Shahin, Yukun Zhou, Eyjolfur Gudmundsson, Adam Szmul, Nesrin Mogulkoc, Frouke van Beek, Christopher J. Brereton, Hendrik W. van Es, Katarina Pontoppidan, Recep Savas, Timothy Wallis, Omer Unat, Marcel Veltkamp, Mark G. Jones, Coline H.M. van Moorsel, David Barber, Joseph Jacob, Daniel C. Alexander

Abstract

Imaging biomarkers derived from medical images play an important role in diagnosis, prognosis, and therapy response assessment. Developing prognostic imaging biomarkers which can achieve reliable survival prediction is essential for prognostication across various diseases and imaging modalities. In this work, we propose a method for discovering patch-level imaging patterns which we then use to predict mortality risk and identify prognostic biomarkers. Specifically, a contrastive learning model is first trained on patches to learn patch representations, followed by a clustering method to group similar underlying imaging patterns. The entire medical image can be thus represented by a long sequence of patch representations and their cluster assignments. Then a memory-efficient clustering Vision Transformer is proposed to aggregate all the patches to predict mortality risk of patients and identify high-risk patterns. To demonstrate the effectiveness and generalizability of our model, we test the survival prediction performance of our method on two sets of patients with idiopathic pulmonary fibrosis (IPF), a chronic, progressive, and life-threatening interstitial pneumonia of unknown etiology. Moreover, by comparing the high-risk imaging patterns extracted by our model with existing imaging patterns utilised in clinical practice, we can identify a novel biomarker that may help clinicians improve risk stratification of IPF patients.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16449-1_22

SharedIt: https://rdcu.be/cVRU2

Link to the code repository

https://github.com/anzhao920/PrognosticBiomarkerDiscovery

Link to the dataset(s)

https://medgift.hevs.ch/wordpress/databases/ild-database/


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposed a framework for prognostic imaging biomarker discovery and survival analysis based on contrastive learning and ViT, and exemplified its application in IPF.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    the proposed framework could detect novel biomarker, which is very useful to guide the radiologist.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    None

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    very good

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The only minor point is that the physiological meaning of the novel C36 biomarker should give more explanations.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    this is good paper in every aspect, which can be potentially extended to broader applications for different diseases and image modalities.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The document aims to derive a method for survival prediction from lung CT scans. Patch representations are learnt by a modified contrastive learning method. Next, these patch representations are clustered using spherical L-Means. The final survival prediction is made by a clustering Vision Transformer (ViT) using the patch representations and their cluster assignments.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    I believe this paper to be novel not only in the proposed method (that combines contrastive learning, spherical k-Means clustering, ViT clustering, etc), but also by the fact that using the proposed method a novel biomarker was found.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main weakness of this work is the comparison with rather old methods (3D ResNet-18 and 3D ResNet-34).

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors claim that the code will be released with publication. I thus believe the work to be reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    I would like to congratulate the authors for their impressive work. My only comment is that comparison should be performed both with standard methods and with other recent techniques.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This is a novel work with a clear and interesting application. Methodology is sound and statistical significance has been computed. I thus recommend this work for acceptance.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #4

  • Please describe the contribution of the paper

    The authors propose a two-stage approach to predict survival of CT images from patients with idiopathic pulmonary fibrosis. In the first stage, the authors learn descriptors of image patches via self-supervised learning. In the second stage, the authors group patch-descriptors via K-means and and pass that information to a ViT to predict survival risk scores. Evaluation is performed based on internal cross-validation and a separate hold-out dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed approach is interesting, because it offers some degree of explainability (by relying on image patches) and lowers the complexity of ViT by assigning image patches to clusters (via K-means). Evaluation on hold-out data indicates a strong improvement of the proposed approach over 3D ResNet approaches.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The proposed approach is only compared against two 3D ResNet models. Other baselines would be helpful. A regular ViT trained end-to-end for survival prediction would help to judge the benefit of the proposed clustering scheme. A shallow model (e.g. Random Survival Forest) based on texture features (e.g. radiomics) or features extracted by ResNet from step 1 would help to justify the two-stage approach.

    Many hyper-parameters (patch size, number of patches, number of clusters, size of latent representation, …), but it is unclear how the sensitive the proposed framework is their choice. In particular, the number of clusters K seems to be critical as it seems offer trade-off between expressiveness and complexity of the ViT.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Reproducibility seems fair. Datasets used in the study seem to be not publicly available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The proposed framework follows a two-stage procedure. In stage 1, self-supervised is used to train a ResNet to extract features from image patches. In stage 2, patch-descriptors are clustered and a ViT is trained to predict risk scores of survival. Overall, the proposed approach is interesting and combines interesting ideas from unsupervised deep learning and transformers to predict survival of patients with idiopathic pulmonary fibrosis. However, there are some issues with the paper that should be addressed.

    Major issues:

    1. Additional baseline should be included in the experiments (see above).
    2. How sensitive is the proposed approach the various hyper-parameters, in particular the number of clusters K?
    3. CT images are 3D images, yet, image patches seem to be 2D. In which plane are the patches, and how is the third dimension treated?
    4. In section 2.2, the authors mention sequence length N, but it is unclear how it relates to the patch-descriptors computed in section 2.1. Does N correspond to the number of extracted patches? The experiments seem to indicate otherwise, which does raise the question how exactly this sequence is defined.
    5. In section 2.2, the authors write that “queries within the same cluster can be represented by a prototype”. Please clarify what query and prototype are? Are queries patch-descriptors (from step 1) and the prototype the cluster centroid (from K-means)?
    6. In section 2.3, the authors discuss how novel biomarkers can be discovered, however important details are missing? First, how are “existing biomarkers” defined? What is the measure of correlation? What does “relatively far” exactly mean? How is the p-value to measure “predictive of mortality” computed, and has multiple testing be considered?
    7. The ablation study in table 2 mentions two entries, which are not sufficiently explained. “w/o contrastive learning”: How is the ResNet from step 1 trained in this setting? “w/o attention pooling”: How are per-patient predictions formed in this model?
    8. In table 1, how can the proposed model (ResNet-18 and ViT) have less parameters than the ResNet-18 model?

    Minor issues:

    1. Please provide more details about the datasets, in particular about the follow-up period and the amount of censoring. A Kaplan-Meier curve would be helpful.
    2. At which time points was the IBS evaluated?
    3. Please add the Kaplan-Meier curve as a lower-bound of the IBS to table 1.
    4. The Cox-loss has only been re-discovered by ref. 18, but was originally proposed by Faraggi D, Simon R. A neural network model for survival data. Stat Med 1995, which should be the preferred citation.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper proposed an interesting framework and the empirical evaluation suggests that it is effective, however additional baselines in the experiments and a justification of the selected hyper-parameters would strengthen the paper.

  • Number of papers in your stack

    7

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper proposes a two-stage approach to predict survival of CT images from patients with idiopathic pulmonary fibrosis. In the first stage, it learns descriptors of image patches via contrastive learning. In the second stage, it groups patch-descriptors via K-means and and passes that information to a ViT to predict survival risk scores. The proposed method is novel in that combines contrastive learning, spherical k-Means clustering, and ViT clustering, it also finds novel biomarkers. It also offers some degree of explainability. The experimental results are convincing. However, it is desirable to compare with more recent methods.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4




Author Feedback

We thank reviewers for their comments. Below is our response to these comments.

R2 and R3

The major concern of R2 and R3 is that this work is only compared with rather old models (3D ResNet). R3.1: “Comparison should … with other recent techniques”. R2.1: “it is desirable … recent methods.” R4 also suggests including more methods for comparison, such as a regular ViT and a shallow model based on texture features. R4.1.1: “Additional baseline should be included …”.

The “w/o contrastive learning” entry in the ablation study is a regular ViT trained end-to-end, without using ResNet from step 1 to extract patch embeddings first. This regular ViT provides a comparison baseline which justifies the two-step framework. ViT is also a recent model. This also addresses the question in R4.1.7: “w/o contrastive learning: How is the ResNet from step 1 trained …”. We will add more explanations in Sec. 3.3 to clarify this point. In Supplementary Table 2, we provide the survival prediction result of a Cox model based on visual scores. Visual scores are interstitial lung disease extent (the sum of honeycombing, reticulation, and ground glass opacity) and emphysema extent visually measured by radiologists, which are the most common texture features used in the clinical practice of IPF. This can be seen as the baseline of a shallow model.

R4

Except for adding baseline, another main point raised by R4 is about sensitivity of hyper-parameters. R4.1.2: “How sensitive … the number of clusters K?” The patch size is decided based on the experience of radiologists for better interpretation. We also considered 32, but it results in a much longer sequence that proves intractable for clustering ViT; we will explore other patch sizes in future work. We set latent representation size D as 256 to use the pre-trained model weights. We will add additional results of different K in Supplementary Table 1.

R4.1.3: “CT images are 3D images … how is the third dimension treated?” The patches come from the axial plane (the x- and y-axis). We use 3D positional encoding to include positional information of z-axis as shown in Supplementary Table 1, but it doesn’t significantly boost the performance on top of 2D patches. We’ll further explore other planes rather than simply adding positional information of z-axis in future work. We will clarify this in Sec. 4.

R4.1.4: “In section 2.2 … how N relates to the patch-descriptors …” Different CT scans can be divided into different numbers of patches, which results in various input lengths for ViT. We pad all input sequences to be a fixed length N, and N is larger than the number of extracted patches of any patient in the dataset. We will clarify this in Sec. 2.2.

R4.1.5: “In section 2.2 … what query and prototype are? …” Before multi-head attention, input vectors are projected to query vectors through a linear projection layer. Queries can be seen as representations of input patch-descriptors, so patches within the same cluster should have similar queries. The prototype of one cluster is the centroid of queries within this cluster. We will clarify this in Sec. 2.2.

R4.1.6: “In section 2.3 … important details are missing …” As shown in Sec. 3.1, we use a dataset that contains established lung tissue patterns defined by clinicians. As for “relatively far”, we choose the cluster that is farthest from existing patterns. Correlation coefficient r is used. P-value of a Cox model is computed to show if a variable is significantly predictive of mortality. More details can be found in Supplementary Fig.1.

R4.1.7: “w/o attention pooling”: How are per-patient predictions formed …? We simply apply average pooling, which takes the average of patch risks. We will clarify this in Sec. 3.3.

R4.1.8: “In table 1, how can … have less parameters …” The proposed model uses 2D ResNet to extract patch features, while the comparison method is 3D ResNet.

For minor points R1.1 and R4.2.1-R4.2.4, we will adapt them in the camera-ready file.



back to top