Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Afsah Saleem, Zaid Ilyas, David Suter, Ghulam Mubashar Hassan, Siobhan Reid, John T. Schousboe, Richard Prince, William D. Leslie, Joshua R. Lewis, Syed Zulqarnain Gilani

Abstract

Abdominal Aortic Calcification (AAC) is a known marker of asymptomatic Atherosclerotic Cardiovascular Diseases (ASCVDs). AAC can be observed on Vertebral Fracture Assessment (VFA) scans acquired using Dual-Energy X-ray Absorptiometry (DXA) machines. Thus, the automatic quantification of AAC on VFA DXA scans may be used to screen for CVD risks, allowing early interventions. In this research, we formulate the quantification of AAC as an ordinal regression problem. We propose a novel Supervised Contrastive Ordinal Loss (SCOL) by in- corporating a label-dependent distance metric with existing supervised contrastive loss to leverage the ordinal information inherent in discrete AAC regression labels. We develop a Dual-encoder Contrastive Ordinal Learning (DCOL) framework that learns the contrastive ordinal representation at global and local levels to improve the feature separability and class diversity in latent space among the AAC-24 genera. We evaluate the performance of the proposed framework using two clinical VFA DXA scan datasets and compare our work with state-of-the-art methods. Furthermore, for predicted AAC scores, we provide a clinical analysis to predict the future risk of a Major Acute Cardiovascular Event (MACE). Our results demonstrate that this learning enhances inter-class separability and strengthens intra-class consistency, which results in predicting the high-risk AAC classes with high sensitivity and high accuracy.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43987-2_27

SharedIt: https://rdcu.be/dnwJJ

Link to the code repository

https://github.com/AfsahS/Supervised-Contrastive-Ordinal-Loss-for-Ordinal-Regression

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper presents an end-to-end system that can extract both local and global features with supervised contrastive ordinal loss. Contrastive learning strategies are used to optimize feature distances. The results are promising and very effective for abdominal aortic calcification scoring.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The overall background for Abdominal Aortic Calcification Scoring on Vertebral Fracture Assessment Scans is very well explained in a way that is easy to follow for both.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1, There are some typos in the paper, including the “genera” in the abstract section. 2, The LCOL and GCOL components are novel ideas, but it lacks solid novelties in terms of model and training strategies. 3, The details GCOL do not offer enough intuitions for why and how it relates to feature similarities.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Given the fact that the paper mostly applies existing methods, it won’t be too difficult to reproduce the work.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The most interesting part of this work, stage 1, should utilize more self/un-supervised learning techniques. Besides, since separability is an important goal of the system, it would be better to see some discussion with the T-SNE plots in the discussion part.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This is an solid work with some novel in terms of overall system to tackle a difficult medical image problem. The results are promising, along with supporting materials in supplementary. Major weakness lies in the writing and organization and short of model and method’s novelty.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The author carefully explains some concerns I laid out in initial reviews. I appreciate this effort, but nothing seems substantial enough to change my ratings. As illustrated in the rebuttal, the novelty of this paper fits as a framework level rather than a model level. I think this weak acceptance is suitable for this work.



Review #2

  • Please describe the contribution of the paper

    This paper explores a novel loss function, an ordinal supervised contrastive loss, for abdominal aortic calcification (AAC) scoring in abdominal DXA scans which are commonly done for vertebral fracture assessment (VFA). As such, this paper presents their model as a strong candidate for opportunistic AAC screening. The results demonstrated by the authors show that the model indeed identifies high-risk AAC cases with high sensitivity and accuracy.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The loss presented is intuitive, well-motivated, and a natural extension of the existing literature on supervised contrastive methods. This is a good application of this method and the application of opportunistic screening is promising. Showing that the method works on multiple different DXA machines is also a strong motivation for further exploring this. Another strength of the paper is the ablation study and relatively extensive comparison to other methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The authors use cross-validation to evaluate their method and draw comparisons with other methods, however, they neglect to specify how hyperparameters were chose. Hyperparameters have to be tuned on a separate validation set as to avoid overfitting to the validation data, otherwise, comparisons to other methods are mute and generalization beyond this dataset often suffers. This is a major problem and calls the otherwise strong results presented into question.

    The authors propose and motivate an ordinal loss for risk classificaiton, yet, in Stage-II of the training, they use RMSE to predict risk classes, why is that? Why not also use the ordinal loss here?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    No issues besides the lack of a testing set and details on hyperparamter tuning.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Figure 1 is hard to parse, in large part due to the small font size and colors not being utilized effectively.

    Page 5 “we assimilate the features” -> this is a vague description with no clear definition. Were features pooled? Concatenated?

    Details on demographics and patient counts should be given in a table 1.

    Figure 2 should state the AUROC scores explicitly.

    Figures from the supplement should be referenced in text or removed.

    How are cross-validation splits done? Entirely randomly or on patients?

    The clinical analysis on page 7 is extremely hard to read an could be summarized into a figure of hazard ratios.

    The authors provide no detail on how hazard ratios were calculated.

    AAC scores should be defined in the introduction, doing it in the experiments section is too late.

    A figure displaying examples of different AAC categories would be helpful for non-clinicians.

    How are the assessors trained? Are they clinicians/ techs/…?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The only reason for a weak reject is the lack of a distinct test set. If the authors can address this or give details on hyperparameter choices this is a weak accept to accept.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The authors addressed the concerns about hyperparameter tuning, mentioning that performance of the final model was evaluated on a distinct test set from the validation sets used during CV, this is sufficient for an accept.

    Describing the assessors as “world-leading clinicians” is still a weak point, this makes no distinction between a physician who may not encounter this kind of work during their day-to-day activities and radiologists specialized in this modality. However, this is a relatively minor concern.



Review #3

  • Please describe the contribution of the paper

    This paper proposes a novel framework (Dual-encoder Contrastive Ordinal Learning (DCOL)) using contrastive loss (Supervised contrastive ordinal loss) to quantify Abdominal Aortic Calcification (AAC). They validated DCOL with SCOL using two VFA DXA datasets, and showed that the model outperformed the state-of-the-art method for the AAC quantification using VFA scans. They also showed the clinical applicability of DCOL by predicting the future risk of a Major Acute Cardiovascular Event (MACE).

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The novelty of using SCOL for AAC quantification is clear that it outperformed SupCon and AdaCon which used contrastive loss functions for making the embeddings apart from each other. They validated using two cilnical datasets to show that SCOL can better perform when the datasets are highly skewed and with limited-size.
    2. Experiments show that it is clear that DCOL is effective for extracting local and global features from the images, which resembles the human when quantifying AAC using VFA scans.
    3. Authors show the clinical applicability of the proposed method by predicting MACE events within the datasets.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Similar to previous works (stated in the introduction), the score for the moderate group was still lower than other groups. Although, the overall performance was better than the baseline method, it is not clear why they are still lower than other groups similar to the previous studies.
    2. I assume that the ROI was manually extracted from a full view of the thoracolumbar spine. This may not be reproducible for future usage and may significantly impact the model performance if the ROI is not properly cropped.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    1. Hyper-parameters, computational equipments, and codes are (will be) available for the reproducibility of this study. Also, the methods are clearly explained and easy to follow.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. It is not clear that if the attention blocks in Local encoder are effective in capturing relevant features. Although the attention-based feature map is shown in figure 1, I don’t see the relevancy between using attention blocks and capturing the AAC features in VFA scans.
    2. It is not clear with the ablation study when using LCL and GCL separately. Did you use only a single encoder or keep both encoders?
    3. Although the AUC score for MACE events are stated in the paragraphs, it would have been more clearer if the AUC scores were also shown in figure 2.
    4. In Table 2, what does ‘bold’ number mean? If it represents the better result, then Hologic-Moderate-PPV should be changed. Or it should be clearly stated that the bold numbers refer to the better results. (Same in Table 1)
    5. Do you have results of RMSE+Attn loss function and using two decoders for local and global features? Is this the baseline model?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well-organized and clearly written, showing technical novelty and clinical applicability of the proposed method. However, it is still not clear that SCOL is effective for classifying moderate groups.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper received three reviews with two accept and one reject. From the detailed comments, this is a paper with mixed opinions. So we would invite authors for rebuttal.




Author Feedback

Thank you to the reviewers and ACs for the valuable feedback. Corrections for typos and minor errors and the suggestions by the reviewers will be incorporated in the camera-ready version.

Though R1 appreciates the novelty of the ideas, they feel that LCOL and GCOL lack novelties in terms of model and training strategies. Our novelty in these areas comes in terms of the DCOL framework, which has a unique training strategy. We first train the LCOL and GCOL using proposed loss (SCOL) and then exploit well-separated features in a regression model to predict AAC24 scores DCOL demonstrates effectiveness in handling noisy or low-quality medical images. R1 also feels that GCOL does not provide enough intuition. In the GCOL module, our proposed loss function operates on the global embeddings extracted from the encoder Eg (ref. fig 1) to learn the feature embeddings at a global level. However, we agree with the reviewer that using only GCOL is not enough to obtain high performance, for which a combination of GCOL and LCOL (the local module) should be used (Table-1). We have added some discussion on the T-SNE plots in the supplementary material providing a reference in the main manuscript.

R2 has pointed out to the lack of proper explanation of our cross-validation process and the hyperparameter tuning: We apologize for this confusion. We employ stratified 10-fold cross-validation (same as SOTA[7,18]) where we train 10 distinct networks for the 10 different folds. Each network is trained on 1,722 scans (1550 training and 172 validation scans) and then tested on 192 unseen test images. The hyperparameters were tuned on the validation set (172 scans). R2 has also suggested the use of ordinal loss in Stage-II of the training process. Ordinal loss proposed in Stage-1 operates in latent space and cannot predict the regression label, whereas regression loss like RMSE operates in label space and predicts the final label. This combination works well for our problem. #Were features pooled or concatenated? They were concatenated to combine the features extracted from LOCL and GCOL in stage 1. #Calculation of hazard ratios: We use cox proportional hazards model to calculate HRs. #About the assessors. They are world-leading trained clinicians.

R3 wonders why the score for the moderate group was still lower than others. This is because of the high similarity between AAC scores near the class boundaries. The moderate group shares boundaries with the low and severe categories. Though our model separates the classes in latent space, there is still room for improvement. R3 assumes that the ROI is manually extracted. This is not correct. To extract the ROI, we automatically remove (crop) the upper half (50% from top) of the thoracolumbar spine in the pre-processing step.

  • the attention-based feature map is shown in figure 1- We draw R3’s attention to Table 1 (ablation study) which demonstrates the effectiveness of the LCL block. The attention map shown in Fig 1. is GradCAM map. It highlights the most discriminative features of the image that contribute to final regression results. It does not provide the exact pixel-wise location of aortic calcification.
  • The ablation study .. single encoder or both encoders? A single encoder is employed when we use either LCL or GCL module, but two encoders are used when we use both modules.
  • ‘Bold’ number In Table 2- They represent the best results. Thanks for pointing out this mistake in lines 1, 3 and 5 in Table 2. The PPV values were swapped with the NPV values in the baseline results of Hologic data only. We have corrected this as follows (DCOL values remain unchanged, BL=Baseline): AAC Class—Method—NPV–PPV
    Low—BL–78.89–71.11 Mod—BL–80.78–62.16 High—BL–96.46–98.45
  • Results of RMSE+Attn: Please find below: Pearson Accuracy F1-Score Sensitivity Specificity 88.12 83.52 77.91 77.00 86.64




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    After integrating all information and reading the submission, AC makes the following impression: This is a solid work with a strong application and good clinical indications on a challenging problem. This may help detect multiple diseases from a single scan.

    “This is an solid work with some novel in terms of overall system to tackle a difficult medical image problem. The results are promising, along with supporting materials in supplementary. Major weakness lies in the writing and organization and short of model and method’s novelty.

    The paper is well-organized and clearly written, showing technical novelty and clinical applicability of the proposed method. However, it is still not clear that SCOL is effective for classifying moderate groups.”



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    After rebuttal, all three reviewers support for the acceptance. Along with my reading of the paper and rebuttal, a decision of accept is recommended according to the overall quality of the paper. However, I encourage the authors to revise the paper per the reviewers’ suggestions in the official version which helps to enhance the impact of the paper on the MICCAI society.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This study describes a novel approach for detecting the risk of Adbominal Aortic calcifications, which is a very challenging task with subtle features on x-ray images. While I recommend the acceptance of this study on account on clinical utility and novelly, I strongly encourage the authors to reconsider their extensive use of abbreviations in future studies. A few (1-2) abbreviations are fine, but so many impede the readability of the paper.



back to top