Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Wenao Ma, Cheng Chen, Jill Abrigo, Calvin Hoi-Kwan Mak, Yuqi Gong, Nga Yan Chan, Chu Han, Zaiyi Liu, Qi Dou

Abstract

Intracerebral hemorrhage (ICH) is the second most common and deadliest form of stroke. Despite medical advances, predicting treatment outcomes for ICH remains a challenge. This paper proposes a novel prognostic model that utilizes both imaging and tabular data to predict treatment outcome for ICH. Our model is designed to be trained on observational data collected from non-randomized controlled trials, providing reliable predictions of treatment success. Specifically, we propose to employ a variational autoencoder model to generate a low-dimensional prognostic score, which can effectively address the selection bias resulting from the non-randomized controlled trials. Importantly, we develop a variational distributions combination module that combines the information from imaging data, non-imaging clinical data, and treatment assignment to accurately generate the prognostic score. We conducted extensive experiments on a real-world clinical dataset of intracerebral hemorrhage. Our proposed method demonstrates a substantial improvement in treatment outcome prediction compared to existing state-of-the-art approaches. Code is released in the supplementary.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43904-9_69

SharedIt: https://rdcu.be/dnwIh

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #3

  • Please describe the contribution of the paper

    This paper proposes a novel generative prognostic model for predicting ICH treatment outcomes using imaging and tabular data. To address the imbalance problem caused by using the training data collected from non-randomized controlled trials, the proposed model is built on a VAE and integrated with multi-modality information using a variational distribution combination module.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) The task setting is very interesting and has potential applications in real life. The authors explore how to build a treatment recommendation model that predicts the effect of all possible treatment assignments based on imaging and tabular data.

    (2) The experimental setting is also convincing and comprehensive. For fair comparison, the authors extend some SOTA methods to multi-modality cases. To evaluate whether proposed methods can address selection bias, the authors change the degree of selection bias by varying the number of cases with IVH who underwent conservative treatment and show the effectiveness of the adopted VAE architecture. The authors also conducted an extensive ablation study to show the effectiveness of each component.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) Why is the posterior distribution q also conditioned on the factual outcome Y? Since q is used for predicting z, and then z is used for predicting Y, I think the use of Y as condition is kind of like using the answer to predict the answer. Could the authors give more interpretation for this operation?

    (2) Why do the authors separately use the prior distribution and posterior distribution in the training and inference process?

    (3) Could the authors give more explanation for the factual outcome Y? I think it should be the mRS score mentioned in the experimental part. Is it like multi-class labels? Maybe the authors could move the mRS score part to the methodology section to help with understanding.

    (4) The two challenges mentioned in the paper, “missing counterfactual outcome” and “selection bias,” seem highly related to each other and could be regarded as the same thing. I feel that the selection bias of the model is because of missing counterfactual outcomes. Could the authors clarify the differences between these two challenges?

    (5) When extending SOTA methods to multi-modality data, how do the authors combine the features from images and tabular data for the final prediction? Do they also use PoE?

    (6) I am interested in the treatment assignment distribution of 504 cases and how many cases use conservative treatment and how many use surgery. Could the authors give some details to show them?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors have provided codes in their supplementary material. The reproducibility of this work is promising.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    I recommend that the authors address the concerns outlined in the weakness section as thoroughly as possible, as this would strengthen the paper and improve its contribution to the field.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed task is not only interesting, but also has important real-world applications. The presented method is well-motivated and supported by a comprehensive set of experiments. The authors have carefully designed their experiments to address potential challenges and provide convincing evidence of the effectiveness of their approach. But some implementation details need more interpretation as mentioned in the weakness part.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper proposes a generative prognostic model for predicting ICH treatment outcomes from a fusion of imaging and non-imaging data. They evaluate this method on an in-house stroke dataset and compare to several common treatment effect estimators. The model uses a variational encoder for imaging and tabular data separately which are then combined and predicts a prognostic score for the patient on and off treatment. From that score, the model computes the probability of an adverse outcome.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Applicability: The model is evaluated on a large real-world clinical dataset for an important task. They acknowledge the ramifications of using an observational dataset and have taken measures to counteract the selection bias from the data. Comparisons: The paper provides several comparisons to alternative models and an ablation study of their method which explore the model’s properties in great yet succinct detail. Clairity: The explanations of the model, dataset, and outcomes were clear.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Metrics: the metrics reported, AUC, ACC, and policy risk depend on factual estimates only and the sign of the estimated treatment effect. It would be nice to have an outcome which depends on the value of the predicted uplift e.g. https://www.uplift-modeling.com/en/latest/api/metrics/uplift_auc_score.html.
    Assumptions: The cited Intact-VAE specifies the dimensionality of the prognostic score must be less than or equal to the dimension of the outcome, so only latent dimensions of 1 would sufficiently address the problem of lack of overlap. In the ablation study the authors show results for a latent dimension of 1 but the AUCs for this model are much closer to the Intact VAE values and suggest the fusion of clinical and imaging data may not be as significant for improving prediction results. If this is indeed the case and the increased latent dimension is a relaxed assumption it should be noted somewhere. However, I may be misinterpreting the requirements of the prognostic score.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The code is currently available and the procedure is well described.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    If the prognostic score indeed needs to be of a lower or equal dimension than Y it should be noted. Showing a metric based on the predicted uplift would also assuage my concerns about the model learning the selection bias.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    8

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors performed many baseline and ablation experiments to evaluate their model for a clinically important problem. The model is well designed and produces stronger results for classification which translate to better patient average outcomes when used for treatment assignment.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #1

  • Please describe the contribution of the paper

    The paper proposed a VAE based generative model to predict treatment outcome for ICH. The key componet of this technique is to develop a variational distributions combination module that combines the information from imaging data, non-imaging clinical data, and treatment assignment. The proposed method looks interesting and novel, and the results look promissing as well.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) exploring the VAE idea on this particular application is interesting 2) the way of combining imaging and non imaging data looks novel.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) although the results on single center data look promissing, the generalizability of the model is concerned 2) the prediction improvement over other methods is marginal. The significance of scientific and clinical value is questioned. 3) In table 3, how many patients predicted accurately should be exactly reported, especially for the metric of accuracy.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    can be reproduced somehow given the data and public code

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    please refer to the weakness.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Exploring VAE on this particular application could interest the readers in this area

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper presents a novel treatment outcome prediction model for intracerebral hemorrhage that is able to predict the effect of all/multiple possible treatment options. To to so, a VAE-based model is developed that combines imaging and additional clinical data (tabular data) while also taking into account the selection bias present in training data available from non-randomized trials.

    Treatment outcome prediction with respect to functional outcome is a topic of high clinical relevance and the paper introduces a novel method as acknowledged by all reviewers. The combination of imaging and non-imaging with a VAE based generative model appears to be sound and the paper provides an extensive ablation study to justify the choice of a VAE and to investigate the need for multi-modal information fusion to achieve sufficient performance. Moreover, a comprehensive and convincing comparison to several state-of-the-art methods highlights the methods superiority. All reviewers see the paper above the acceptance threshold and no major flaws have been identified. Weakness highlighted by the reviewers are rather minor (generalizability should be shown on additional data, description of parts of the experiments could be improved, additional metrics for the evaluation, …). For me, the major weaknesses are related to the description/motivation of the method and the interpretation of the results. For example, and as also highlighted by R#3, why is posterior q conditioned on the factual outcome Y? That seems counterintuitive and should be clarified. Moreover, a better description of the multi-modal extensions performed for the state-of-the-art approaches should be incorporated and a discussion of the potential clinical implications of the rather minor performance gains by the proposed approach would be helpful.

    For me, the novelty of the method and an interesting application scenario outweigh the weaknesses identified by the reviewers and I, therefore, see the paper slightly above the acceptance level.




Author Feedback

N/A



back to top