Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Shantanu Ghosh, Ke Yu, Kayhan Batmanghelich

Abstract

Building generalizable AI models is one of the primary challenges in the healthcare domain. While radiologists rely on generalizable descriptive rules of abnormality, Neural Network (NN) models suffer even with a slight shift in input distribution (e.g., scanner type). Fine-tuning a model to transfer knowledge from one domain to another requires a significant amount of labeled data in the target domain. In this pa- per, we develop an interpretable model that can be efficiently fine-tuned to an unseen target domain with minimal computational cost. We assume the interpretable component of NN to be approximately domain- invariant. However, interpretable models typically underperform com- pared to their Blackbox (BB) variants. We start with a BB in the source domain and distill it into a mixture of shallow interpretable models us- ing human-understandable concepts. As each interpretable model covers a subset of data, a mixture of interpretable models achieves comparable performance as BB. Further, we use the pseudo-labeling technique from semi-supervised learning (SSL) to learn the concept classifier in the target domain, followed by fine-tuning the interpretable models in the target domain. We evaluate our model using a real-life large-scale chest-X-ray (CXR) classification dataset. The code can be found at: https://github.com/annonymous-vision/miccai.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43895-0_59

SharedIt: https://rdcu.be/dnwzr

Link to the code repository

https://github.com/batmanlab/MICCAI-2023-Route-interpret-repeat-CXRs

Link to the dataset(s)

https://github.com/batmanlab/MICCAI-2023-Route-interpret-repeat-CXRs#downloading-data


Reviews

Review #2

  • Please describe the contribution of the paper

    In this paper, the authors propose a novel iterative interpretable method that identifies instance-specific concepts without losing the performance of the BB and is effectively finetuned in an unseen target domain with no concept annotation, limited labeled data, and minimal computation cost. Overall, the idea is novel and well presented. Extensive experimental results are provided to substantiate the effectiveness of the proposed method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The ideas are novel, the authors are trying to learn multiple “experts” utilizing the human-understandable concepts. Each expert is able to predict a concept well and overall will provide a comparable performance with BB and not much labeling are included, which is very helpful to the practical use cases.
    2. The proposed method are well presented and the notations are clear.
    3. Extensive experimental results are provided.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Although the idea is novel and the motivation is good, there are some implementation details are missing. For example, how the specific hyper parameters are selected, do people need to do an analysis on the dataset they use to better fix the expert number? What will happen if an image without concepts learned by the “experts” go through the pipeline? Will it provide some wrong diagnoses?
    2. It is a little confusing to express the residual as f - g, since their outputs are class labels not class types. The authors may propose a better notation for this conception.
    3. it is hard to read Fig 2, please make it more clear and easy to understand with better symbols.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors provide the implementation code and the method is clearly described. The reproducibility of the paper is high.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. Please add more explanations about how to fix the expert numbers.
    2. Please fix Fig.2 as suggested.
    3. Please pay more attention to the notations, try not to bring confusions.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper proposes novel idea that is beneficial for the practical use and the method is clearly presented. Extensive results are provided to substantiate the effectiveness of the proposed method.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    In this paper, the authors develop an interpretable model that can be efficiently fine-tuned to an unseen target domain with minimal computational cost. It is assumed that the interpretable component of the neural network is approximately domain-invariant.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    They start with a BB in the source domain and distill it into a mixture of shallow interpretable models using human-understandable concepts. As each interpretable model covers a subset of data, a mixture of interpretable models achieves comparable performance as the BB. Additionally, the pseudo-labeling technique from semi-supervised learning (SSL) is used to learn the concept classifier in the target domain, followed by fine-tuning the interpretable models in the target domain.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. More visualizations of domain transfer results would be beneficial in improving the quality of the manuscript.
    2. It is also recommended to include more baseline experiments in comparison experiments.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Experiments with supplementary material verified the contributions mentioned in the manuscript.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. It would be beneficial to have more visualizations of domain transfer results in order to enhance the quality of the manuscript.
    2. It is suggested to include more baseline experiments in comparison experiments for better evaluation.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This manuscript is well organized, and the experimental results contribute to the acceptance of the manuscript’s conclusions. The work is attractive and easy to follow.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #1

  • Please describe the contribution of the paper

    This work tries to make a distilling yet interpretable model from an expert system under a black box setting. The proposed method can fall into the explainable AI field.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed method generally extracts the interpretable factors from the source domain with human-understandable concepts. And then, they employ the pseudo-labeling algorithm to learn the concept classifier for the target domain. Experimental results show its advancement.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The authors should further clarify how could the proposed method enhance the efficiency in knowledge transfer. To me, this work may not end-to-end training algorithm, so the performance should make great finetuning steps.

    They used the existing pseudo-labeling algorithm, which may directly constrain the performance of the transfer learning algorithm. If we change this algorithm to another one, how about the performance? The reviewer would like to see the performance in the rebuttal.

    Moreover, some researchers challenged the meaning of the pseudo-labeling algorithm. Are there any other possible solutions to this task? Why and how?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors released their codes, and it may be reproducible yet not fully confirmed.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    See weaknesses

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper is easy to read and the proposed method is good enough to be published.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper proposes interpretable model that can be efficiently fine-tuned to an unseen target domain with minimal computational cost. The experiments are sufficient.The paper is well written and easy to follow. However there are some important comments from the reviews which needs to be addressed and incorporation of the comments will enhance the quality of the paper further.




Author Feedback

[R1] on the enhancement of the efficiency of knowledge transfer by the proposed method..

  1. Improvement of the performance: Unlike the Blackbox (BB), the interpretable models aim to find the domain invariant interpretable anatomical concepts for prediction. This approach is closer to the radiologists’ search for patterns of changes in anatomy to read abnormality from an image and apply domain-invariant logical rules for specific diagnoses.
  2. Computational efficiency: The experts are efficient computationally. All the weights of each layer of the BB (including the convolution $\Phi$ and the classifier $h$) will be updated during the transfer learning. However, the experts are simple Entropy neural networks (Barbiero et al.), accepting the lower dimensional concepts, unlike the images for the BB. Also, for the residual, we only finetune the classifier ($h$), keeping the conv layers($\Phi$) fixed.

[R1] on the challenges of the pseudo-labeling algorithm..

We agree to the limitations of pseudo-labeling. We pseudo-label to get the concepts for correctly classified samples in the target domain using source BB. As the BB classifies the same disease in both domains, our intuition is that its representation also captures the domain-invariant anatomical concepts that the experts will extract later. Also, our focus was to use this notion of invariant concepts for efficient transfer learning. So, we did not deeply research different techniques of pseudo-labeling as it was not the paper’s central goal. In the other questions, we compare them with the other modern pseudo-labeling methods. Also, beyond this active semi-supervised learning or co-training can be utilized in future work.

[R1] On comparing with other pseudo labeling algos?

For 05% data: MoIE [1]: 0.87 || MoIE + R [1]: 0.82 || MoIE [2]: 0.88 || MoIE + R [2]: 0.84 || MoIE [3]: 0.88 || MoIE + R [3]: 0.84

For 10% data: MoIE [1]: 0.90 || MoIE + R [1]: 0.91 || MoIE [2]: 0.90 || MoIE + R [2]: 0.91 || MoIE [3]: 0.91 || MoIE + R [3]: 0.93

For 15% data: MoIE [1]: 0.91 || MoIE + R [1]: 0.89 || MoIE [2]: 0.91 || MoIE + R [2]: 0.89 || MoIE [3]: 0.91 || MoIE + R [3]: 0.9

[1] Lee, D.H., et al.: Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. Lee et al., ICMLw, 2013

[2] FixMatch. Sohn et al., Neurips, 2020

[3] FlexMatch. Zhang et al. Neurips, 2021

In a very short amount of time, we implement [2, 3] for the rebuttal. In spite of its simplicity, [1]’s (used in the paper) performance is comparable with the other two for edema.

[R2] on the selection of number of experts?

Overall we want all the experts cumulatively to cover 90% of the data or we stop when the final residual has an AUROC<0.7.

[R2] On the scenario of an image without concepts learned by the “experts” going through the pipeline?

Our method assumes the concepts are given in the dataset either explicitly or retrieved from the radiological reports. Also, the experts do not take the image as an input, they take the concepts as an input.

[R2] On the confusion to express the residual as f - g. Sorry for the confusion. Actually, r = f - g denotes the difference b/w the logits of the prev BB and the current expert. We will clarify this in the paper.

[R2] On claryfing Fig 2 We updated the figure with a better resolution.

[R3] On more baseline experiments in comparison experiments.

For Effusion, [1] w sup 0.75 || [1] w/o sup 0.70 || [2] w/o AR 0.71 || [1] w AR 0.73

For Pneumonia [1] w sup 0.63 || [1] w/o sup 0.62 || [2] w/o AR 0.60 || [1] w AR 0.62

For Cardiomegaly, [1] w sup 0.76 || [1] w/o sup 0.77

[1] A Framework for Learning Ante-hoc Explainable Models via Concepts. Sarkar et al., CVPR, 2021

[2] Addressing Leakage in Concept Bottleneck Models. Havasi et. al., Neurips 2022

We compare further only for Effusion, Pneumonia and Cardiomegaly with interpretable baselines [1,2] due to shortage of time. For posthoc, we have only PCBMs which



back to top