Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Christoph Fürböck, Matthias Perkonigg, Thomas Helbich, Katja Pinker, Valeria Romeo, Georg Langs

Abstract

Molecular breast cancer sub-types derived from core-biopsy are central for individual outcome prediction and treatment decisions. Determining sub-types by non-invasive imaging procedures would benefit early assessment. Furthermore, identifying phenotypic traits of sub-types may inform our understanding of disease processes as we become able to monitor them longitudinally. We propose a model to learn phenotypic appearance concepts of four molecular sub-types of breast cancer. A deep neural network classification model predicts sub-types from multi-modal, multi-parametric imaging data. Intermediate representations of the visual information are clustered, and clusters are scored based on testing with concept activation vectors to assess their contribution to correctly discriminating sub-types. The proposed model can predict sub-types with competitive accuracy from simultaneous ${}^{18}$F-FDG PET/MRI, and identifies visual traits in the form of shared and discriminative phenotypic concepts associated with the sub-types.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16449-1_27

SharedIt: https://rdcu.be/cVRU7

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose a method to predict the molecular sub- types of breast cancer, and to identify the associated shared and discriminative visual traits from simultaneously acquired MRI and PET data.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors apply the TCAV method on a multi-modal training image dataset. This application is an important contribution towards interpretable identification of breast cancer sub-types. Evaluation of the classification accuracy is competitive with recent state of the art approaches, while using a smaller dataset, and takes advantage of the multi-parametric multi-modal data.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The related work section has limited reference to other literature in latent space clustering and TCAV and DTCAV implementations in medical imaging, e.g. Gamble et al; Clough et al.2019; Janik et al. 2021. It is not clear how the proposed methods differ from existing work.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Yes, the methods are reproducible. The authors reference the ResNet-18 classification model that they train with a modified classification head to fit the task of 4-class classification. Details of the private dataset are provided although there is no reference to ethical approval.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The authors apply the TCAV method on a multi-modal training image dataset. This application is an important contribution towards interpretable identification of breast cancer sub-types. Figure 3 is a particularly good illustration. However, the font size on the radial diagrams is too small.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The key contribution is the extraction of phenotypic concepts shared across multiple examples. This enables the investigation of associations between top- scoring concepts and shared or discriminating features among molecular sub- types. Therefore the proposed method can identify clinically relevant features in imaging data.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper performed image classification on molecular sub-type classification for breast cancer patients. The proposed pipeline consisted of the classification using ResNet-18 pre-trained on ImageNet, the latent features were then extracted from the intermediate layers and clustered using k-means. Phenotypic Concepts were formed and weighting on these was calculated for each sub-type category.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Strength of this paper lies in the pioneering application of concept calculation on medical images for breast cancer patients, which is one step closer to find the co-relation between phenotypic traits and microscopic categories.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Weakness of this paper is really minor and some typos can be corrected during the proofreading process.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Data from 102 patients was not made public nor is the coding.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    This paper performed image classification on molecular sub-type classification for breast cancer patients. The proposed pipeline consisted of the classification using ResNet-18 pre-trained on ImageNet, the latent features were then extracted from the intermediate layers and clustered using k-means. Phenotypic Concepts were formed and weighting on these was calculated for each sub-type category.

    Strength of this paper lies in the pioneering application of concept calculation on medical images for breast cancer patients, which is one step closer to find the co-relation between phenotypic traits and microscopic categories.

    Weakness of this paper is really minor and some typos can be corrected during the proofreading process.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The major factor that backs up my decision of acceptance, is the efforts of trying to set up connection between phenotypic and molecular both visually and heuristically, which might agree with human experts in the future work.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #4

  • Please describe the contribution of the paper

    This paper proposes an application of testing with Concept Activation Vectors (CAV) by Ghorbani et al., on breast cancer imaging, showing that the proposed methods provide interesting insights on visual features characterising breast cancer sub-types. The main contribution added to the state of the art method is the identification of patient-specific concepts in the latent space, rather than concepts that generalise to the entire pool of data.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper poses a relevant question in the field: can we identify phenotypes with non-invasive imaging and what features in the image may be associated to each phenotype? This is a relevant and timely question. The paper is well written, clearly exposed and the results present interesting insights. The formulation of patient-specific concepts is rather interesting, as it allows to describe each patient by a set of measurable imaging features and their respective values.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There are a few limitations:

    • I fear the novelty in the paper is only incremental, with little innovation proposed to further advance the work done by Ghorbani et al.
    • I think the related work is not sufficiently up to date. The authors missed important works that are relevant and related to their analyses and mentioning these works would have made the paper more complete.
    • More attention and more insights should have been given about the concept formulation, particularly about the contribution proposed in Eq. 1. This is the most important contribution of this work and I think that it was not sufficiently discussed. How do the concept footprints compare across patients? Are there some concepts that are shared among patients?
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Very good.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Below, I comment on the paper in more detail:

    • I think important references to previous works are missing to justify the contributions and methods. For example, the paper [a] (details below) is an extremely relevant reference that should not miss in concept-based interpretability works. In this paper the authors explain the limitations of learning concepts directly in a model’s latent space, without applying any transformation to make the space centered and orthogonal with respect to concepts. They demonstrate that an important limitation of CAVs is that two very different concepts may have very similar vectors (in terms of their cosine distance) simply because of how the latent space is organised (e.g. skewed and far from the center).

    [a] Chen, Z., Bei, Y. & Rudin, C. Concept whitening for interpretable image recognition. Nat Mach Intell 2, 772–782 (2020). https://doi.org/10.1038/s42256-020-00265-z

    Similarly, I think a discussion of related works based on concept attribution would have made the related work richer and more interesting. The work in [b], for example, shows how the binary concepts in CAV can be quantified and transformed into continuous-valued concepts. I think that knowing this work may have even further helped the authors to find ways to improve the current state-of-the-art methods and produce more insights about their work. The bidirectional scores proposed in [b] can illustrate if a given concept is responsible for an increase or a decrease of the probability assigned to each class, hence the authors would have been able to clarify not only what concepts contribute to the identification of phenotypes, but also in what way.

    [b] M G, V A, S MM, H M. Concept attribution: Explaining CNN decisions to physicians. Comput Biol Med. 2020 Aug;123:103865. doi: 10.1016/j.compbiomed.2020.103865. Epub 2020 Jun 17. PMID: 32658785.

    In addition to [a] and [b], the work in [c] proposed alternative scores then the TCAV score to overcome some of the limitations of TCAV.

    [c] Yeche, Hugo, Justin Harrison, and Tess Berthier. “UBS: A dimension-agnostic metric for concept vector interpretability applied to radiomics.” Interpretability of Machine Intelligence in Medical Image Computing and Multimodal Learning for Clinical Decision Support. Springer, Cham, 2019. 12-20.

    • As a cascade effect, I think that by not including the reference in [a] the authors missed to addressed an important limitation of CAVs, namely that distant concepts may be pointed at by similar vectors. The authors could have addressed this limitation by adding, for example, an analysis of the cosine similarity between the vectors for each concept.
    • I think that more in depth discussions about the concept footprints would have been useful. What do the vectors look like for patients with similar footprints? How far/close are they in the latent space? Can it be that similar patient share similar footprints? Is it possible to identify concepts that generalise to the entire set of patients?
    • The quality of the figures is to improve. The text is too small to read.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I thank the authors for this work, as it is interesting and well presented. What prevents me from accepting is the limited novelty, that seems mostly incremental.

  • Number of papers in your stack

    7

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Major strengths include development of a network classification model to predict sub-types in breast cancer based on PET/MRI data as well as use of concept activation vectors to assess interpretability. Overall approach and differentiation is reasonably well motivated, experiments are well described, and CAVs are used to identify phenotypes in common between subtypes. Minor weaknesses are incremental differentiation from work by Ghorbani et al and cross-validated evaluation of model.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    2




Author Feedback

N/A



back to top