Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Anthony Vento, Qingyu Zhao, Robert Paul, Kilian M. Pohl, Ehsan Adeli

Abstract

Translating machine learning algorithms into clinical applications requires addressing challenges related to interpretability, such as accounting for the effect of confounding variables (or metadata). Confounding variables affect the relationship between input training data and target outputs. When we train a model on such data, confounding variables will bias the distribution of the learned features. A recent promising solution, MetaData Normalization (MDN), estimates the linear relationship between the metadata and each feature based on a non-trainable closed-form solution. However, this estimation is confined by the sample size of a mini-batch and thereby may cause the approach to be unstable during training. In this paper, we extend the MDN method by applying a Penalty approach (referred to as PDMN). We cast the problem into a bi-level nested optimization problem. We then approximate this optimization problem using a penalty method so that the linear parameters within the MDN layer are trainable and learned on all samples. This enables PMDN to be plugged into any architectures, even those unfit to run batch-level operations, such as transformers and recurrent models. We show improvement in model accuracy and greater independence from confounders using PMDN over MDN in a synthetic experiment and a multi-label, multi-site dataset of magnetic resonance images (MRIs).

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16437-8_37

SharedIt: https://rdcu.be/cVRtn

Link to the code repository

https://github.com/vento99

Link to the dataset(s)

https://github.com/mlu355/MetadataNorm/blob/main/synthetic_dataset.py

https://adni.loni.usc.edu/


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose an alternative optimisation scheme to impose the orthogonality constraint proposed in the MetaData Normalization paper.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Learning representations that are free of confounding factors is an important aspect for a wide range of tasks in medicine and beyond. The evaluation is extensive and illustrates the proposed scheme is less affected by smaller batch sizes than the original approach.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Related work on learning confounding/bias-free representations is very limited. The authors only refer to adversarial learning schemes (refs. 11,18), but a large amount of literature exists beyond that. Most importantly, the authors should refer to other works that are based on orthogonalisation too: (i) Tartaglione et al. “EnD: Entangling and Disentangling Deep Representations for Bias Correction”. CVPR 2021. (ii) Neto. “Causality-aware counterfactual confounding adjustment for feature representations learned by deep models”. (iii) Liu et al. “Projection-wise Disentangling for Fair and Interpretable Representation Learning: Application to 3D Facial Shape Analysis”. MICCAI 2021.

    The smallest batch size that has been studied is 80, however for many tasks, in particular involving multiple 3D volumes, a batch size of 80 is infeasible. Studying very small batch sizes (<20) would be helpful to understand whether the proposed approach would still be effective in this setting.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Reproducibility is fair, except for the choice of lambda, which has not been described.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The proposed work is heavily inspired by the previous work entitled “MetaData Normalization” (MDN). The MDN approach is removing bias from learned feature representation by using metadata, capturing confounding variables, to estimate the residuals of latent representation and metadata, such that the confounder-free representation is guaranteed to be orthogonal the space spanned by the confounding variables (metadata). While residuals can be obtained in closed-form, such an approach becomes unstable when using batch-based learning. The authors propose to address this problem by replacing the hard constraint (orthogonality) with a soft constraint and performing alternating optimisation.

    The proposed approach seems to be effective in removing confounders and to be more stable than the original work. However, there are some issues that if addressed could improve this work.

    1. As mentioned above, the discussion on related work is insufficient and has to be extended to give a comprehensive overview on learning confounding/bias-free representations.
    2. The authors write that the benefit of MDN, and therefore the proposed PMDN, over adversarial approaches is that it can remove confounding effects from multiple layers. I am not sure whether this is a valid argument. If I remove the bias only from the latent representation prior to the network’s final prediction layer, I would guarantee that the network’s prediction is confounding-free. From a theoretical perspective, I do not see any benefit from removing bias from intermediate representations. In fact, “Causality-aware counterfactual confounding adjustment for feature representations learned by deep model” by Neto follow this approach.
    3. Several assumptions implied in MDN are not made explicit or are discussed. Three crucial assumptions are that (i) the confounding variables (metadata) must block all backdoor paths between image/latent representation and the outcome to be predicted, (ii) the confounding variables influence the latent representation linearly, and (iii) that the confounding variables must be linearly independent. It would be best to be upfront about these assumptions and discuss their rational.
    4. Since the hard constraint of orthogonality is replaced with a soft constraint (via regularisation), lambda in eq. (7) effectively determines to which degree this constraint is enforced. A study on the effect of lambda on learning a confounder-free representation would be helpful to understand whether there is a downside to going from a hard to a soft constraint.
    5. The authors claim that the original MDN suffers from instability, yet the experiments on synthetic data in section 3.1 have not been carried out repeatedly to demonstrate the variance of the proposed PMDN is indeed lower than that of MDN.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed approach seems to be effective, however, more related work and a discussion on the assumptions implied in MDN are required.

  • Number of papers in your stack

    7

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    The authors addressed my concerns in their rebuttal.



Review #2

  • Please describe the contribution of the paper

    This paper proposed a penalty term for MetaData Normalization. Unlike the original linear regression based method, the PMDN learn the projection \beta using a neural network. Experiment shows improvement over the baseline method MDN on several dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper slightly improved from MDN to learnable and trainable version, to better perform the regression and handle the difficulty of large batch size.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Theoretical analysis of PMDN is not enough. In section 2, PMDN seems to be using a neural network to replace the linear regression and simply by adding a loss term during training. The contribution of this method is not enough unless further analysis, including convergence rate, theoretic bound of the error, etc., is developed.

    2. Experiment baseline is weak. There are many other method (from Fairness deep learning) can be performed and compared with the proposed method. For example, Canonical Correlation Analysis (CCA), and Rényi Fair Inference. This aspect is not considered in the current paper, nor does the author discuss the pro and cons of this aspect.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    This paper should be reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. The iterative updating of W and \beta can be performed as Block Stochastic Gradient Decent, and thus can have better theoretical convergence.

    2. Another important baseline is Canonical Correlation Analysis (CCA) which maps the MetaData and the learned features in the same domain and compute the correlation. If we minimize the correlation, we can guarantee that the data is independent to the MetaData, which shows the same benefit as the proposed method in this paper.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method is slightly beyond the MDN baseline and lacks the analysis of the proposed method. The experiment requires further comparisons between different other methods.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    This paper proposed a Penalty MetaData Normalization (PMDN) method. PMDN extend the conventional MDN whose performance is confined by the sample size of a mini-batch, by using a penalty method so that the linear parameters within the MDN layer are trainable and learned on all samples. The results show that the proposed PMDN method can improve the classification performance compared to other four methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Compared to conventional MDN, the beta in proposed PMDN is learnable and is not confined by the batch size.
    2. The figure 1 is very clear and is helpful to understand the overview of this paper.
    3. The writing and organization of this paper is good and easy to follow.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. According to f − Mβ, the metadata should have the same dimension as f. But the representation of different metadata has different dimensions: acquisition site is one-hot encoded, the age is z-score and the sex is a binary number. It is not clear how to feed the metadata into the model.

    2. As beta is trained based on both the classification loss and the penalty loss, it is not clear if Mβ contains all and only metadata related information.

    3. Figure 3 only compare the results under a small batch size equals to 80 which is unfavorable for MDN. For fair comparison it is better to also compare the tSNE results under other batch size.

    4. There are some typos, such as “… the matadata of each group separately and report …”

    5. This paper has no keywords.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The code and data can be released according to the reproducibility checklist.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Q5

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The writing and organization of this paper is clear. But some part of the paper does not follow the template (no keywords). The evaluation of this paper is not enough, figure 3 only show the tSNE under some specific batch size.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper addresses the important problem of confounders in disease prediction models. The presented method shows improved performance over previous work, especially MDN.

    However, reviewers are concerned about

    • missing related work
    • lack of in-depth analysis (theoretical and experimental) and assessment of the method in the regime of very small batches < 80
    • the effect of the regularization parameterized by lambda

    I would encourage the authors to address the above points in the rebuttal.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    8




Author Feedback

We thank all reviewers and the AC for the valuable comments. They recognized the merits of our method and its importance in reducing confounding effects. Here, we clarify the concerns and will address them in the final paper.

(MR,R1,R2) Discussion on related work We will cite them. However, we should mention that methods based on counterfactuals (R1) require reliable counterfactual generative models wrt arbitrary variables, which are still under study for MRIs. That is why such methods for fairness or explainability are often only applied to face image datasets (where generative models work well). Methods based on disentanglement or orthogonality (R1) are similar to ours but they have so far only been tested on one single bias variable (multiple bias entanglement/disentanglement may create contrastive objectives). Our PMDN models all biases/confounders together and regresses them out at once. The CCA and Rényi correlation methods (R2), if embedded into end-to-end deep learning (DL), are meant to reduce correlation with the confounders. We already have a baseline method in the paper that does this using an adversarial Pearson correlation loss [18]. Note that methods based on correlation are also batch-level operations (to enable calculating correlation), and therefore have the limitations of the original MDN. Our PMDN is a simple extension of a widely studied traditional/seminal statistical method of removing the effect of confounders without any batch operation (usable in all DL models).

(MR,R2) In-depth theoretical analysis Our method adds a regularization (penalty) to already established DL loss functions (eg, BCE), which is based on a method that already has a closed-form solution. It requires large batch sizes and therefore like other DL methods we turned it into a trainable operation solved by SGD. The theoretical convergence and bounds of these types of operations are widely studied in the early years of DL/convex optimization fields. We also agree with R2 that BSGD could possibly result in faster convergence, but this has also been explored widely and was not the focus of our paper.

(MR,R1) assessment in very small batches < 80 We also generated results for batch size of 20 obtaining an accuracy of 51.3% and Site Corr of 0.155, which are comparable to the results with larger batch sizes (see Table 2). Baseline MDN results on small batches are significantly worse.

(MR,R1) Effect of lambda Although Eq (7) introduces the hyperparameter lambda, in the paragraph after the equation, we explained how we optimize the objective (7) using “an alternating optimization schema.” Alg. 1 summarizes the two steps for training alternating between the two objectives each having its own learning rate. They are then consolidated into the Adam optimizer similar to all standard DL models. Hence, our implementation is independent of lambda.

(R1) remove confounding effects from multiple layers, a valid argument? The extent to which confounding effects can be removed by a model at one layer (adversarial or MDN or all prior work) is confined by the model’s capacity/architecture, so in practice, there will always be some residual biased effects passing the layers. To minimize these effects using multiple PMDN layers showed increased benefits in our experiments (another benefit of PMDN).

(R1) Assumptions implied in MDN Correct. All MDN assumptions are also needed for PMDN.

(R1) original MDN suffers from instability MDN instability we refer to is wrt batch size (not due to randomness) [12]. In small batches, the correlations are not properly removed. See Fig 2 & Table 2.

(R3) According to f−Mβ, metadata should have the same dimension as f This is incorrect. Defining N=# of samples, K=# of metadata or confounders, and d=# of features; f is Nxd, M would be NxK, and β will be Kxd.

(R3) Fig.3 compares small batch sizes Exactly what we intended to do, to show that MDN (required a batch operation) operates inconsistently in small batch sizes.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Rebuttal addressed many concerns of reviewers like discussion on related work and results for small batch sizes. Overall, I would vote in favor of acceptance for this paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper provides tackles a very challenging problem. Though it borrows heavily from the idea of meta data normalization, the addition of the penalty term including the experimental results on brain imaging data makes it a good contribution to MICCAI.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This work proposes a novel modeling framework implementing Meta Data Normalization as a regularization strategy. The paper was found relevant and interesting by the reviewers, and the rebuttal was positive in addressing the questions, especially concerning the methodological rationale and comparison with respect to the state-of-the-art. Overall, the paper provides a good contribution to the conference.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    3



back to top