Authors

Samira Zare, Hien Van Nguyen

Abstract

While deep networks have demonstrated state-of-the-art performance in medical image analysis, they suffer from biases caused by undesirable confounding variables (e.g., sex, age, race). Traditional statistical methods for removing confounders are often incompatible with modern deep networks. To address this challenge, we introduce a novel learning framework, named ReConfirm, based on the invariant risk minimization (IRM) theory to eliminate the biases caused by confounding variables and make deep networks more robust. Our approach allows end-to-end model training while capturing causal features responsible for pathological findings instead of spurious correlations. We evaluate our approach on NIH chest X-ray classification tasks where sex and age are confounders.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16452-1_55

SharedIt: https://rdcu.be/cVVqb

Link to the code repository

https://github.com/samzare/ConfounderRemoval

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

In this work, the Authors propose a variant to the framework of Invariant Risk Minimization (IRM) to reduce/remove the effect of confounders in an X-ray classification task with binary class labels. Specifically, the Authors create ad-hoc IRM environments, based on confounders values, to reduce their effect. The main contribution of this work is the use of the IRM framework to remove confounders; a second contribution is the application to a medical imaging task of X-ray classification. Differently from IRM, the proposed method (ReConfirm) introduces class-conditional penalties to improve stability of features across class and promote feature diversity. Experiments are presented to support the claims.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper is very clearly written and tackle an interesting problem: removal of confounders in nonlinear models. The use of the IRM framework is very welcome in this community and the proposed variants have sound explanations and descriptions. The two experiments - sex as confounder, age as confounder - are compelling.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The numerical improvements of the proposed ReConfirm are sometimes marginal with respect to the traditional Empirical Risk minimization principle, so further testing could improve the result of the experiments, especially in view of a future submission to a journal.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

In the manuscript, the Authors conduct experiments on a publicly available dataset and they claim to publish their code, at a later stage. The procedures explained in the manuscript looks sufficiently detailed to attempt the reproduction of the results presented in the article. Unfortunately, the reproducibility statement given by the Authors look vastly incomplete.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

The manuscript is pretty good. I’d suggest the Authors to use more intuition and examples to explain their concepts and keep the formal description only after that. Especially for a future extension to a journal article, I invite the author to present the topic of removing confounders also from a more historical perspective - which is the one related to linear models - to guide the audience in this very interesting topic.

Minor: it is not clear why the proposed method is specifically called “ReConfirm”.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

7
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The manuscript is very clear and the proposed solution is appealing. The experiments are interesting and the contributions are important. The manuscript has only very minor issues. The main drawback - which is understandable for a conference article - is the experiment and results section that could be more extensive. But again, it is a minor issue.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

1
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

The paper presents a modified invariant risk minimization (IRM) , namely ReConfirm to remove the effect of the confounders. The proposed method were applied to NIH chest X-ray classification tasks where sex and age are confounders. The experimental results outperforms baseline CNN models trained under the traditional empirical risk minimization framework.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This work proposed a modified IRM framework, namely ReConfirm to accommodate class conditional variants for NIH chest X-ray classification tasks , where the invariance learning penalty is conditioned on each class. This work designed a strategy for optimally splitting the dataset into different environments based on the maximum violation of the invariant learning principle.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
The comparative studies seems limited.
1. More datasets and backbones like transformer can be added to verify the generalization ability of ReConfirm.
2. More existing methods could be included for comparisons.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Good. The author will open source the codes for research purposes.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
The paper presents a modified invariant risk minimization (IRM) to remove the effect of the confounders. The proposed method were applied to NIH chest X-ray classification tasks where sex and age are confounders. The experimental results outperforms baseline CNN models trained under the traditional empirical risk minimization framework. This work proposed a modified IRM framework to accommodate class conditional variants for NIH chest X-ray classification tasks , where the invariance learning penalty is conditioned on each class. This work designed a strategy for optimally splitting the dataset into different environments based on the maximum violation of the invariant learning principle. However, the comparative studies seems very limited. I would like to suggest:
1. More datasets and backbones like transformer can be added to verify the generalization ability of ReConfirm.
2. More existing methods could be included for comparisons.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Removing the effect of confounder variables is an interesting research area.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

2
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper
1. A learning strategy based on the invariant risk minimization framework [4] is proposed for medical image classification, such that the classification can be done without reliance on confounding variables such as age or sex.
2. The main idea from [7] is used to define training environments, based on agreement between the known confounding variable and the class label.
3. The original loss function of [4] is extended to include class-conditional penalties. This potentially allows the model to learn different environment-invariant representations for each class.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. An important problem is considered, and a viable solution is presented for the same. I have not seen application of the invariant risk minimization framework to medical image analysis before. For situations where the exact confounding variables are known, the proposed method seems to be promising.
2. Writing is fairly clear.
3. Experiments with two confounding variables (age and sex) show that the proposed method improves performance over empirical risk minimization.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The evaluation of the method is weak, in my opinion. In particular, comparison with adversarial learning based invariant representation learning would have been useful. I suspect that with the used environment definition strategy, such invariant representation learning methods would also work quite well.
2. The effect of the class conditional penalties is unclear. Although I understand the intuitive motivation of allowing learning of class conditional invariant representations, I do not understand why one would apply such a penalty to only one of the classes in question. Further, I could not find a satisfactory explanation as to why the setting with the penalty on only the control class (cReConfirm y=0) leads to the best performance most of the time.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors have agreed to make code publicly available after the review period. The data used in the experiments is from publicly available datasets.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
1. A big limitation of the paper, in my opinion, is the lack of comparison to invariant-feature learning methods from the domain adaptation literature (e.g. [22]). This is especially the case as the experiments done in the paper are in cases where the confounding variables are known in advance.
2. A discussion on the limitations of the invariant risk minimization framework (e.g. Rosenfeld et al. The Risks of Invariant Risk Minimization, ICLR 2021) would have been useful. In particular, it is unclear why the proposed method should work if all confounding variables are not taken into consideration during the training.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper’s treatment of confounding variables in the medical image analysis literature is timely, and the usage of the invariant risk minimization framework is interesting. Although I have concerns regarding the class-conditional penalty’s applicability and the comparison with respect to invariant feature representation learning methods, I will argue that the paper is strong enough to be presented at the conference.
Number of papers in your stack

6
What is the ranking of this paper in your review stack?

3
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
- The paper introduces invariant risk minimization to correct for the effect of confounder.
- All reviewers agreed that the paper is well written and it is an imporant topic.
- Some references are missing and some experiment can strengthen the paper. Comparison with adversarial learning based invariant representation learning would have been useful.
- Also all reviewers emphasized that the source code should be provided for reproducibility
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

1

Author Feedback

N/A

back to top

Removal of Confounders via Invariant Risk Minimization for Medical Diagnosis