Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Weizhi Nie, Chen Zhang, Dan Song, Yunpeng Bai, Keliang Xie, An-An Liu

Abstract

The chest X-ray (CXR) is a widely used and easily accessible medical test for diagnosing common chest diseases. Recently, there have been numerous advancements in deep learning-based methods capable of effectively classifying CXR. However, assessing whether these algorithms truly capture the cause-and-effect relationship between diseases and their underlying causes, or merely learn to map labels to images, remains a challenge. In this paper, we propose a causal approach to address the CXR classification problem, which involves constructing a structural causal model (SCM) and utilizing backdoor adjustment to select relevant visual information for CXR classification. Specifically, we design various probability optimization functions to eliminate the influence of confounding factors on the learning of genuine causality. Experimental results demonstrate that our proposed method surpasses the performance of two open-source datasets in terms of classification performance. To access the source code for our approach, please visit: \url{https://github.com/zc2024/Causal_CXR}.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43898-1_3

SharedIt: https://rdcu.be/dnwAA

Link to the code repository

https://github.com/zc2024/Causal_CXR

Link to the dataset(s)

https://stanfordmlgroup.github.io/competitions/chexpert/

https://nihcc.app.box.com/v/ChestXray-NIHCC/folder/37178474737


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors introduce a structural causal model to the problem of classification of disease from chest x-ray. Results show improvement over a set of baselines on a standard data set.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The work is well motivated, as there is substantial multi morbidity and confounder presence in chest-xray analysis.
    • Formulation of the idea appears sound and some nice ideas in there, some novel some from the literature.
    • Results are strong showing consistent improvement over baselines.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Presentation is patchy: lots of language glitches and odd jargon/phrasing used without introduction of terms.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Seems fine.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    This is a well motivated study. It seems very plausible that properly handling causal relationship in a model that classifies multiple potential morbidities from chest x-ray could improve performance. The authors demonstrate this idea and show strong results on a standard data set. Specifically, the learning of features and subsequent prediction by the transformer of which are confounds and which are causal factors I haven’t seen before, but seems like a nice approach.

    The whole document is a bit strangely written and could do with a strong edit from an experienced writer and a native English speaker.

    S undefined in the first bullet point in section 2.1.

    How many ‘confounding features’ do the experiments randomly add to the causal features each time? Do they try all of them? Some subset? Paper doesn’t say.

    How did the authors decide over the ranges of alpha1 and alpha2?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The ideas are pretty good, but the presentation isn’t great. Could really do with some tidying up.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    In this paper, the authors proposed to use the backdoor adjustment to mitigate the effect of confounders on model predictions. Specifically, they used NIH ChestX-ray14 dataset and try to minimize the confounding effects of artifacts like letters, or medical devices in xray images. They developed a model that disentangles the causal and confounded features by adopting a Transformer-based module. Then they apply causal intervention on disentangled features.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength of the paper is that the authors proposed an approach for mitigating the confounding effects based on the causal inference which is a promising direction. Additionally, the performance is compared to different baselines.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The idea of utilizing backdoor adjustment for bias removal is not novel and is investigated before. Although the paper contains explanations on how to implement the idea, the main novelties of the paper is not clearly stated. Main similar references are not cited and it makes it harder to compare different aspects of the proposed model with previously developed models.
    • The writing is not clear and it is hard for a reader to follow and understand the concepts.
    • The introduction and methodology sections do not provide enough and clear motivations for the way the authors developed their model. For instance, it is not clear how different parts in Section 2.3 are related to section 2.2.
    • The authors claimed that they evaluated the performance of the model on two dataset (page 3), however, the results are only reported for one dataset (NIH chest xray).
    • Related works are not well-studied and similar works are not cited.
    • There are a lot of grammatical and language usage issues in the paper.
  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The dataset is publically availabel and the codes are provided in the supplementary materials.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • The writing is not clear and the idea is not explained well. Sometimes, the same notationis used for different values. For example, in section 2.1 “C” shows the confounder but in section 2.3 “C” is the number of categories. Or similarly, “D” is the input CXR image but in section 2.3 “D” is the training dataset. The authors can add a section and describe the problem and define the notations and stick to them throughout the paper.
    • Another thing that makes it more challenging to clearly understand the paper is that there are lots of grammatical issues in the paper.
    • Captions in Fig. 2 and Fig. 3 can be improved. The caption should provide enough explanations about different parts of the figure. For example, in Fig. 2, caption does not explain how/why the path between D and X is removed. Also, while the script mentions Fig. 2 (a) and (b), there is no such annotation in the figure. Additionally, in Fig. 3, there are different blocks (channel attention, position attention, etc.) in the feature learning part. Neither the caption nor the main script provided explanations about these blocks. How do they help the model and improe the performace?
    • In section 2.1, It is not clear why the authors chose to use P(Y do(X)) to adjust the backdoor path in section 2.1.it is not clear why the link between D and X is cut: “Therefore, we choose to apply the causal intervention to cut off the backdoor path and use P(Y do(X)) to replace P(Y X), so the model has the ability to exploit causal features.”
    • In Section 2.2, bullet points are confusing. For instance, in the second one: “Y ’s response to X and C has no connection with the causal effect between X and C.” what is “the causal effect between X and C”?
    • The methodology and equations are not well developed or motivated. For instance, in equation 2, it is unclear why there should be such relationship between causal and confounding features (confounding features use 1-softmax(.)). The connection between section 2.2 (backdoor adjustment) and developed deep learning model is not explained. Moreover, the authors did not provide any motivation for Equation (5). This equation does not make sense and reader cannot understand the reason behind this equation. What is y_uniform? An why is it uniform?
    • What are models 1,2,3,4 in Table 2?
    • Baselines are not well defined and introduced. Do they share similar goals or structure?
    • Figure 4 needs more description and discussion. Why does the performance of the confounding classifier goes up at first and then down.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The clarity of the paper is poor. There are lots of wiriting issues in the paper. The idea is not well-developed and introduced. The equations are not well motivated and it is unclear how they are derived.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The authors provided detailed answers to the questions and clarifications for different parts. They also promised to improve the writing in the final version.



Review #4

  • Please describe the contribution of the paper

    Authors explore the relationship between the chest x-ray classification is its cause tracking (pixel-level). they also propose a cause learning module to learn this cause expose process though the backdoor loss function. The experiments show the effectiveness of this proposed method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. article is well-written and easy to follow
    2. the cause discovery is an interesting but less-explored research topic
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. the cause discovery results are not well verified by a standard evaluation process, which makes it hard to quantify the accuracy and stability of the cause discovery results.
    2. the method seems to be general and can be applied to many other medical cause discovery cases, for example the optic disc/cup region w.r.t glaucoma[1] [3], melanoma/nova shape w.r.t the benign/malignant[2]. authors may want to discuss its extended application to the other similar cases.
    3. as for the technology, authors should discuss more about is difference and novelty comparing to the previous works[3], like contrast learning and self supervised learning.
    4. authors should also explore the relationship among classification, cause discovery results and pixel-level segmentation[4], about on what cases they can be considered related and what others not.

    [1] Opinions Vary? Diagnosis First! MICCAI 2022 [2] “Calibrate the inter-observer segmentation uncertainty via diagnosis-first principle.” arXiv preprint arXiv:2208.03016 (2022). [3] Leveraging undiagnosed data for glaucoma classification with teacher-student learning. MICCAI2020 [4] Seatrans: Learning segmentation-assisted diagnosis model via transformer MICCAI 2022

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    yes

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    see point 6

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    overall a good paper with novel discovery and interesting idea, but would be better providing more discussion and clarification.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    While reviewers appreciated some of the contributions in the work, there have also been significant concerns. I therefore invite the authors to address the concerns in a rebuttal. The authors should in particular address the shortcomings identified by Reviewer 3, who rated the paper as reject.




Author Feedback

Reply@R1-Writing and @R3-Organization We’ve enlisted native English speakers to refine our paper’s writing.

Reply@R1-S and @R3-Notations A clerical error, S should be X. We should differentiate variable names by making proper changes.

Reply@R1-Numbers of confounding features 30~40% is appropriate.

Reply@R1-α1 & α2 α1:0.4~0.7;α2:0.4~0.5, it leads to a decline when too large or small.

Reply@R3-Novelty, similar works and @R4-Related works We discuss the impact of inherent confounders in CXRs on the classification performance of deep models. Some recent works have explored causal reasoning in VQA and other fields, we found no similar solutions for CXR image. So we compare our results with recently published deep learning-based works. We plan to compare our method with other causal approaches based on our team’s latest research. Thanks to R4’s references, which greatly influence us and help us conduct more thorough research. We concur that this model, given its applicability to various tasks covered in the references, has the potential to be a general approach. We compared various benchmark methods, including various model structures like DenseNet, for a more precise prediction of thoracic diseases. With ResNet, we discovered that our model performed better.

Reply@R3-The relationship between Section 2.2 and 2.3 In 2.2, the backdoor adjustment formula is obtained, then the formula is parameterized in 2.3.

Reply@R3-Datasets We also conducted experiments on the CheXpert dataset and achieved competitive results, we will add them into the final accepted paper If there’s enough writing space.

Reply@R3-Figures and captions Fig.2: The left part is (a) and right is (b). We mentioned the path between D and X is blocked in section 2.1 via the backdoor criterion. Confounding factors can block all backdoor path variables between causal variables, so after adjustment, the path blocked, shown in right part. Fig.3: We provided the code for the model, the position/channel attention are improvements of Ref[12]. Fig.4: The red and grey line represent the C-classifier and X-classifier, respectively. About the performance of C-cls goes up then down, after visualization, we found that confounding factors could be ``beneficial’’ for classification in some cases (e.g., certain diseases require patients to wear certain medical devices during X-rays), but this is the wrong shortcut, we want the model to get causal features. The findings support the arguments we made in the introduction.

Reply@R3-Adjustment process Previous models tend to get the relationship between features and true results 𝑃(Y|X), but we want to eliminate C by eliminating the backdoor path. Causal theory by Pearl helps us exploit the do-calculus on X, to remove the backdoor path by 𝑃b(Y|X) = 𝑃(Y|𝑑𝑜(X)). It needs to stratify the C, then we need the points in section 2.2. As shown in Fig.2, we consider X and C are derived from the input and have no link, which means after applying the adjustment, the conditional probability keeps invariant, so we can get: 𝑃b (Y|X, c) = 𝑃 (Y|X, c).

Reply@R3-Equations In section 2.3, we implement Eq. (1). We use a supervised method based on the Transformer’s built-in attention mechanism and Eq. (4) to obtain the coarse causal features (X) and, at the same time, the confounding features (C) through its supplementary set. Due to the fact that C is not necessary for classification, we want C to be less relevant to the expected results. So, using Eq. (5), we distribute its prediction equally across all categories. According to this loss function, C’s contribution to the results should tend to be uniformly distributed; in other words, if C contributed equally to each disease category, the results could be considered unaffected by confounding factors. A predefined uniform distribution is present.

Reply@R3-Tab.2 Ablation study. Models 1,2,and 3 are models with different modules. Model 4 is the whole framework.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal clarified the concerns of reviewer 3, so that all reviewers are in favor of publishing the paper.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    I don’t think the authors clearly addressed all issues raised by R3. The used data is perhaps also to poorly labelled and has too many confounders to draw valid causal conclusions.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    First, a disclaimer: I am no expert on causality. Now, this paper got only a reject vote by a reviewer and two Weak Accepts. The reviewer (R3) rejecting it seemed to be very well-informed about this topic, and made a number of interesting questions.

    After rebuttal, the rejecting reviewer (R3) appeared to be convinced and raised the score to Weak Accept, so this paper now has a consensus on weak acceptance. Considering that one of the main issues raised by reviewers, lack of clarity in the writing, has been addressed in the response letter (the authors promise to get their work revised by a native speaker), and after reading the paper and comparing it to other works in my batch, I think I am going to recommend (very weakly) acceptance. I see that the main AC recommended rejection, and I would not oppose heavily to rejecting it, but I am not so convinced, and I personally believe (again, not an expert on causality, but I trust R3 very much) that this paper seems to have MICCAI level, if there is room for it.



back to top