Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Linde S. Hesse, Ana I. L. Namburete

Abstract

Convolutional neural networks (CNNs) have shown exceptional performance for a range of medical imaging tasks. However, conventional CNNs are not able to explain their reasoning process, therefore limiting their adoption in clinical practice. In this work, we propose an inherently interpretable CNN for regression using similarity-based comparisons (INSightR-Net) and demonstrate our methods on the task of diabetic retinopathy grading. A prototype layer incorporated into the architecture enables visualization of the areas in the image that are most similar to learned prototypes. The final prediction is then intuitively modeled as a mean of prototype labels, weighted by the similarities. We achieved competitive prediction performance with our INSightR-Net compared to a ResNet baseline, showing that it is not necessary to compromise performance for interpretability. Furthermore, we quantified the quality of our explanations using sparsity and diversity, two concepts considered important for a good explanation, and demonstrated the effect of several parameters on the latent space embeddings

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16437-8_48

SharedIt: https://rdcu.be/cVRuz

Link to the code repository

https://github.com/lindehesse/INSightR-Net

Link to the dataset(s)

https://www.kaggle.com/c/diabetic-retinopathy-detection/data


Reviews

Review #1

  • Please describe the contribution of the paper
    • In this work, the authors propose an inherently interpretable CNN for regression using similarity-based comparisons (INSightR-Net).
    • Experiments were performed using a dataset of diabetic retinopathy grading.
    • The proposed network is able to achieve performances in line with the baseline while being inherently interpretable.
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is clear and well-written.
    • The work is novel. The novelty of the work mainly arises from the extension of ProtoPNet (Chen et al. [https://arxiv.org/pdf/1806.10574.pdf]) to regression tasks.
    • New similarity function and additional loss components led to better explanations (measured in terms of sparsity and diversity).
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The choice of the clinical problem and dataset is dubious, as it is originally an ordinal classification problem and not a regression problem. As the authors did not consider the problem as an ordinal one, they also do not use any ordinal metric.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The work is reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • The authors could provide an additional dataset, where regression is indeed the task to solve. Additionally, it could make sense to adapt their approach to ordinal classification. Both suggestions would make sense for an extended journal version of the work.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • The work presented has considerable novelty, and is worth discussing at MICCAI, as it addresses one of the major challenges for the adoption of computer-aided diagnosis in the clinics (i.e., interpretability).
  • Number of papers in your stack

    3

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #2

  • Please describe the contribution of the paper

    This work proposes propose an inherently interpretable CNN for regression tasks applied in the field of medical imaging, which utilizes the information of similarity-based comparisons. Specifically, the authors incorporate a prototype layer into the model architecture to visualize the areas in the image that are most similar to learned prototypes. The final prediction is then modeled as a mean of prototype labels. Extensive experiments conducted on the task of diabetic retinopathy grading demonstrating the effectiveness of the proposed method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Interpret CNN for regression with similarity-based comparisons is interesting and intuitive.
    • The studied problem is clear and well formalized.
    • The presentation throughout the paper is clear and even reader-friendly.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Inadequate comparative experiments for the the main claim.
    • Detailed ablation experiments were done in this work, however, the authors only compared the methods with the ResNet-based baselines and did not compare the other interpretable work.
    • Unreasonable choice of dataset, which makes the main claim less convincing.
    • Considering that the method proposed by the authors is for the regression problem, experiments on the dataset of the regression task will make the method more convincing compared to transforming the discrete labels into a continuous distribution.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    No code available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • Compare the proposed method with appropriate baselines.
    • Choose a reasonable data set that can demonstrate the validity of the method.
    • The authors should also clarify how the hypermparameters were chosen.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The topic and idea is interesting. However, the choice of baselines and datasets needs more consideration.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    In this paper, the authors propose a INSightR-Net deep neural network for retinopathy grading. It relies on a CNN architecture with a prototype layer. Such protypes help for better explanation while achieving a good accuracy.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors propose a nice validation scheme with out of sample evaluation and prototype analysis that corroborates retinopathy gradation.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The contribution does not bring much novelty wrt to the original paper (Chen 2019), apart from the medical application. Also, the contribution lacks comparison wrt baseline methods for regression.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Implementation details are given, code will be available on github once the anonymity will be removed.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    It would be interesting to include ‘simple’ methods such as linear regression in order to better highlight the benefits of the proposed approach.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Prototype is an intersting concept that can be extended to other medical application. Although a deeper investingation could be conducted to better highlight the benefits of such method.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Somewhat Confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
    • The method proposed an approach to create an interpretable model based on the idea od ProtoNet
    • The paper is well written
    • The choice of the clinical problem and dataset is not convinging for the reviewers.
    • You must address this issue, it is raised by many: “experiments on the dataset of the regression task will make the method more convincing compared to transforming the discrete labels into a continuous distribution.”
    • You must also clearly outline your contribution compared to (Chen 2019)
  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    10




Author Feedback

We would like to thank the reviewers for their constructive comments. We have grouped our responses by topic below.

  1. Dataset Dataset choice [R1, R2, MR]: We aimed to make this study easily reproducible and therefore wanted to use a publicly available dataset. For this reason, we chose to work with the diabetic retinopathy dataset as opposed to using a dataset with a true regression task (i.e. brain age prediction). We hope that by using a public dataset this study can provide a benchmark that can be used when developing this method for further applications.

Label transformation [R1, R2, MR]: We transformed our labels to a continuous distribution in our paper to show that our method works for real-valued labels. However, our method also works with the original (discrete) labels of the dataset, showing very similar results (0.59, 0.52, 14.6, 35.8 (discrete) vs 0.59, 0.53, 14.0, 33.4 (continuous) for the MAE, accuracy, sparsity, and diversity respectively). As both reviewers raised this issue, we agree that it will make the paper stronger to present the results on the original dataset. We can therefore substitute this in our paper in the camera-ready manuscript, and provide the results on the continuous labels in the supplementary material.

  1. Related work Comparison to (Chen 2019) [R3, MR]: The original ProtoPnet method (Chen 2019) was only developed for a classification problem and was therefore not suitable to apply to a regression task as is. Our main contribution lies in the adaptation of this approach to a regression task by modeling the prediction as a weighted mean of prototype labels. Our approach provides a single explanation for the predictions as opposed to an explanation for each of the classes present in the dataset (Chen 2019). Furthermore, we also adapted the similarity function and the loss components to make the prediction sparser and more diverse. We will add a sentence to the camera-ready manuscript highlighting these differences.

Other interpretable work [R2]: When comparing our approach to other interpretability methods there are some key differences to consider. Our method is inherently interpretable, as opposed to most methods that are limited to providing post-hoc explanations [1]. Furthermore, our method provides local attention as well as similar examples from the training set (prototypes) whereas other methods such as feature attribution or saliency methods only provide (local) attention. Lastly, many methods rely on having an activation per class (Chen 2019, sharp-LIME [2]) and can therefore not directly be applied to a regression problem without adjustments to the method. To address R2’s point, we will add a discussion of our results in the context of other related methods in the camera-ready submission. It is challenging to quantitatively compare our results directly to other methods as the metrics we proposed (diversity and sparsity) to assess explanation quality rely on the prototypes. For this reason, it was not possible to obtain the exact same metrics for different interpretability methods. However, our main claim is that we can obtain a detailed explanation for regression without compromising on prediction performance. We believe that this is adequately supported by the comparison with the ResNet as a non-interpretable baseline.

  1. Code availability [R2] We were not able to provide the GitHub link in the submitted version to preserve anonymity. Upon acceptance of the paper, the link will be added to the camera-ready manuscript. In combination with the public dataset, we believe that our study will be very reproducible.

[1] Markus, Aniek, et al. “The role of explainability in creating trustworthy artificial intelligence for health care: a comprehensive survey of the terminology, design choices, and evaluation strategies.” [2] Graziani, Mara, et al. “Sharpening Local Interpretable Model-Agnostic Explanations for Histopathology: Improved Understandability and Reliability.”




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
    • The authors rebuttal about availability of dataset is not that convincing. Many organs has public dataset, some with regression task. I am not using this reason for my decision though
    • The authors perform regression and obtained similar result – that is good
    • I am inclined to accept. If it become borderline at the meta-review, I am Ok with my decision being over-rulled
  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    na



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper introduces an interesting concept with Protorype. Although there was some concerns on the selection of the dataset and evaluation, the authors address them during the rebuttal. I think the ordinal classification and regression have high similarity and it would be not a critical issue for rejection of this paper. I would recommend acceptance of this paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Although the authors clearify the choice of dataset, the proposed method is still not test on the true regression task. Therefore the effectiveness cannot be justfied based on the results demonstrated in the paper. The reason of data choice didn’t convince me。

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    8



back to top