Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Zhusi Zhong, Jie Li, Shreyas Kulkarni, Yang Li, Fayez H. Fayad, Helen Zhang, Sun Ho Ahn, Harrison Bai, Xinbo Gao, Michael K. Atalay, Zhicheng Jiao

Abstract

Bias in healthcare negatively impacts marginalized populations with lower socioeconomic status and contributes to healthcare inequalities. Eliminating bias in AI models is crucial for fair and precise medical implementation. The development of a holistic approach to reducing bias aggregation in multimodal medical data and promoting equity in healthcare is highly demanded. Racial disparities exist in the presentation and development of algorithms for pulmonary embolism (PE), and deep survival prediction model can be de-biased with multimodal data. In this paper, we present a novel survival prediction (SP) framework with demographic bias disentanglement for PE. The CTPA images and clinical reports are encoded by the state-of-the-art backbones pretrained with large-scale medical-related tasks. The proposed de-biased SP modules effectively disentangle latent race-intrinsic attributes from the survival features, which provides a fair survival outcome through the survival prediction head. We evaluate our method using a multimodal PE dataset with time-to-event labels and race identifications. The comprehensive results show an effective de-biased performance of our framework on outcome predictions.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43904-9_50

SharedIt: https://rdcu.be/dnwHv

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #3

  • Please describe the contribution of the paper

    The authors developed a framework for predicting survival that includes a de-biased SP module. This framework aims to reduce the risk of racial bias in the prediction model, specifically in the case of pulmonary embolism. Race is considered a protected attribute in the dataset, which is collected from multiple centers.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The dataset used in this study is multimodal, consisting of multiple types of data, which is a realistic scenario in many real-world applications.
    2. The integration of multiple types of data is essential for improving prediction accuracy.
    3. Figure 3 uses a nice visualisation technique to represent the effect of debiasing on the dataset.
    4. The results of the study appear to be correct and reliable.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1: The paper’s main assumption is that Black patients are at higher risk of adverse outcomes, but the reasons behind this are unclear. Therefore, it is uncertain whether the authors are evaluating the correct assumption. 2: The authors did not compare their approach to state-of-the-art methods for bias correction, which could provide more insight into the effectiveness of their method.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Yes

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. The authors should consider performing a transferability assessment across multiple centers to evaluate the generalizability of their framework. This will provide insight into how well their model performs in different settings and help identify any potential issues that may arise when applying the model to different datasets.
    2. The authors should try to understand the underlying causes of the problem. They could investigate whether the patients’ race correlates with comorbidities, socioeconomic status, or other factors that may affect their health outcomes. This will help identify potential confounding variables that should be controlled for in the model. Additionally, the authors could try to interpret the results of their model to understand which factors are driving the differences in survival rates between different racial groups.
    3. It may be worthwhile for the authors to compare their framework with state-of-the-art methods for bias correction to evaluate how well their approach performs compared to existing methods.
    4. The authors could consider collecting additional data on the patients to provide more comprehensive information on their health status. This could include data on comorbidities, socioeconomic status, and other factors that may affect their health outcomes.
    5. The authors could consider collaborating with experts in ethics and diversity to ensure that their framework is ethically sound and does not perpetuate any biases or discrimination against certain racial groups.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper has been thoroughly validated and the dataset used in the study is of high quality.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This study developed a PE survival prediction model based on multi-modalities of data with racial de-bias. The main contributions are: (1) finding the bias diversity in multimodal information with survival prediction fusion framework. (2) proposing a de-biased survival prediction framework with demographic bias disentanglement. (3) The multimodal CPH learning models improve fairness with unbiased features.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This study aimed to develop a racial de-biased framework and a PE survival prediction model based on multi-modality data with this framework for fairer AI algorithm for all races. The authors firstly evaluated the racial bias (i.e. white patients and black patients) using multimodal survival prediction model without debias; and proposed the de-biased SP module using a race-intrinsic attributes-decoupled branch and a race-conflicting attributes-decoupled branch, which were trained together with survival prediction branch using CE loss、GCE loss and CoxPH loss respectively. The proposed methods present better novelty, and was described clearly with better reproducibility, the experimental results can better demonstrate the clinical feasibility.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The authors do not introduce the method summary of the proposed de-bias framework in the abstract and distribution in introduction. Additionally, although the experimental results demonstrate the feasibility of the proposed methods, the motivation isn’t introduced why the CE loss and GCE loss can achieve the racial debiased. CE loss and GCE loss seem to add the features information on the race bias.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors described clearly the model framework including training model and inference model, and gave the loss function and training details. So there is a good reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    1) It should be better to add readability and attract the attention of the readers if the authors can add the summary of the proposed de-bias framework in the abstract and distribution in introduction. 2) Although the experimental results demonstrate the feasibility of the proposed methods, the motivation isn’t introduced why the CE loss and GCE loss can achieve the racial debiased. CE loss and GCE loss seem to add the features information on the race bias. It is better if authors can introduce the motivation.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed methods have better novelty, feasibility, reproducibility and clinical significance.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #1

  • Please describe the contribution of the paper

    This paper presents an approach to learn debiased representations for clinical variables, imaging and text/report data, and combine them to train a debiased multimodal model for survival prediction using Cox PH loss. The approach has been applied to survival prediction of PE patients, on a dataset of 918 patients.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Paper addresses an important problem of training survival models while mitigating bias that is present in existing datasets e.g. PE related severe outcomes.
    • Experiments show that better C-index values can be obtained using debiased representation learning, and the combination of the single modality models into a multimodal further improves the C-index.
    • Ablation studies have been conducted to show the important the feature swap [13] and balanced sampling during model training.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • In general, PESI scores have generally been validated for up to 1 year survival. It would be helpful if authors can report the log rank for both short (1 year) and long term (5 year) period, and consider adding a discussion about the results. The reported KM curves between PESI and multimodal does seem to suggest that PESI might be more valuable in short term. This analysis is important since the paper talks about the clinical relevance of this work, and its important to characterize the model performance for survival at different period to better assess its utility. It’s possible that proposed model may be more valuable in long term assessment, as compared to short term, which could still be of value.
    • Technical novelty of presented work is limited, and can primarily be attributed to the application of the debiasing approach presented in [13] with addition of survival prediction head trained using Cox proportional hazards loss.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Dataset used is proprietary so reproducibility may be limited. Values used for variables such as lambda_swap in combined loss function would be valuable. Text only states the values are set to balance the importance of feature aggregation, which may be an important detail to achieved documented performance.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    This work would benefit from addition of an analysis on what enables the multimodal model to perform better than the individual ones. For instance, consider a couple of examples where performance improvement is obtained compared to one or more individual models demonstrating effectiveness of the combined model, as well as, help with explainability.

    For PE outcomes, short term (6 months to 1 year) survival would potentially have high clinical value compared to 5 year survival, and paper would benefit from a separate subgroup analysis on utility of the models over short term.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This work has merit in its application of the debiased learned representations and demonstrating its effectiveness on am important issue of addressing bias in the machine learning models trained on clinical data, which are often implicitly biased. Since the technical novelty, in light of [13] and use of Cox PH loss seems limited, its important to assess the potential impact in the application space (i.e. PE survival prediction), particularly on short-term vs long-term survival probability. It would be helpful in assessing the potential impact if authors can report performance numbers on the short term, e.g. 6 months of 1 year, due to higher relevance.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #4

  • Please describe the contribution of the paper

    This study addresses an important issue when working with biased data to develop AI systems for clinical decision support. Models are prone to learn biases present in the training data, as previous studies have shown. The authors present a method to de-bias models for survival prediction based on two key components: a) un-biased sampling during training, and b) feature swapping, where features from different samples are swapped to ensure the influence of the different bias classes is reduced. The method was tested on multi-center data for pulmonary embolism survival prediction.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Clear motivation and embedding into prior research
    • Novel method
    • Clear experimental setup
    • Meaningful ablation studies
    • Clear demonstrated benefit
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • No comparision to existing de-biasing techniques
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The datasets are clearly described, the main text and supplementary information provide enough information about the networks used, how they were trained and evaluated, and contain information about parameter setting.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • The paper is well-written and the figures contribute to the understanding of the method and your experimental setup
    • I appreciate how you motivate your method and embed it into prior research
    • The results are presented in a very clear way, I like the direct comparison of the bias and prediction performance in the results.
    • The only thing I miss is a comparision to other de-biasing techniques (e.g., to a simple additional head to predict the bias class from a multi-modal embedding, used to un-learn the confounder (such as https://www.sciencedirect.com/science/article/pii/S1053811920311745). Consider adding such a comparision in case you extend this work to a journal paper version.

    Minor remarks:

    • Please introduce PESI and CTPA at the first occurrence
    • In Figure 1 inside the inset with the de-biasing model: should the red arrow pointing to L_{CE} start from the y~_{ID} box?
    • When you describe the image pre-processing, you write: “[…] and applied with zero-centered” Does this mean you performed a z-score normalization? (- very minor: In Fig. 3, some sub-plots have parts of grey borders, consider cropping them slightly)
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • Clear and meaningful experimental setup
    • Novel method with a demonstrated benefit
    • Paper is well-written
  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Strengths: address an important problem of mitigating race bias in existing datasets; good experiment results with meaningful ablation studies and demonstrated benefits; considerable technical novelty with demonstration for the survival prediction use case; use of clinically realistic data of multi-modal input

    Weaknesses: lack of analysis of short- vs long-term prediction; need further clarification on why some of the loss functions can achieve race de-bias; lack of comparisons to other SOTA bias-correction methods; potentially compromised technical novelty in light of references [13] and [17].




Author Feedback

Thank you for taking the time to review our manuscript and providing your insightful comments and suggestions. We sincerely appreciate the objective and insightful evaluations provided by the reviewers. Our work aims to address bias and unfairness in survival analysis based on deep models by decoupling latent bias information, such as race, while preserving the survival information in CTPA images, clinical reports, and PESI clinical diagnoses. Our proposed method leverages the integration of multimodal data to comprehensively analyze and mitigate the modalities’ bias with different distributions. We have carefully considered your comments and made the necessary revisions to improve the quality of the paper.

  1. In the experimental section, we compared the predicted risks from our model with the PESI scores and calculated their correlations with survival labels (up to 2664 days). This analysis reflects the advantage of our framework over PESI in long-term predictions. In the context of survival analysis for PE patients, it is meaningful to examine short- vs long-term prediction performance by setting different maximum truncation events. Firstly, in clinical practice, PESI and its clinical variables reflect the current physiological status of PE patients and hold clinical value for short-term survival prediction. Therefore, within our multimodal framework, the branch based on the PESI clinical variables incorporates short-term clinical information, enabling it to capture short-term effects. Secondly, our model incorporates CTPA and clinical reports, which implicitly capture the patient’s physical condition and long-term medical history and treatment background. Hence, the multimodal fusion of predictive risks can combine short-term physiological information with historical information, providing long-term survival risk estimation. By integrating these different modalities, our multimodal framework leverages the strengths of each modality to capture both short-term and long-term survival-related information, offering a comprehensive prediction of survival risks.

  2. It is necessary to compare our approach with other state-of-the-art de-biasing methods. While research on debiasing in deep learning has received considerable attention, de-biasing in survival analysis is a relatively new research problem. In our forthcoming work, we compare the survival prediction module of our framework with traditional survival analysis methods, such as Random Survival Forests (RSF) and Cox Proportional Hazards (CPH) models. Comparisons with other bias disentanglement methods, as the de-biased modules for multimodal features, will allow us to evaluate the fairness of our framework’s survival prediction under different debiasing performances. This component will be an important experimental aspect in our future work.

  3. In terms of the novelty of our model, we first address the diverse bias distributions in survival analysis and propose a multimodal fusion framework for survival analysis. The integration of multimodal fusion with survival debiasing is the primary innovation of our approach in this paper, while the disentanglement representation in the debiased module is a secondary innovation. The debiased module in the framework consists of a decoupling representation learning component and a survival risk prediction head based on debiased features. The combination of bias decoupling and multimodal fusion represents a meaningful innovation, which will be further explored in our future research.

  4. Further validation of the model’s generalizability is necessary. In our subsequent research, we have collected additional identification for patients, including ethnicity and gender, and conducted repeated grouping experiments. The results from multiple experiments demonstrate general effectiveness under various biased conditions.

  5. We will carefully proofread the camera-ready version to address any issues present in writing and figures.



back to top