Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Hadrien Reynaud, Athanasios Vlontzos, Mischa Dombrowski, Ciarán Gilligan Lee, Arian Beqiri, Paul Leeson, Bernhard Kainz

Abstract

Causally-enabled machine learning frameworks could help clinicians to identify the best course of treatments by answering counterfactual questions. We explore this path for the case of echocardiograms by looking into the variation of the Left Ventricle Ejection Fraction, the most essential clinical metric gained from these examinations. We combine deep neural networks, twin causal networks and generative adversarial methods for the first time to build D’ARTAGNAN (Deep ARtificial Twin-Architecture GeNerAtive Networks), a novel causal generative model. We demonstrate the soundness of our approach on a synthetic dataset before applying it to cardiac ultrasound videos to answer the question: “What would this echocardiogram look like if the patient had a different ejection fraction?”. To do so, we generate new ultrasound videos, retaining the video style and anatomy of the original patient, while modifying the Ejection Fraction conditioned on a given input. We achieve an SSIM score of 0.79 and an R2 score of 0.51 on the counterfactual videos. Code and models are available at: https://github.com/HReynaud/dartagnan.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16452-1_57

SharedIt: https://rdcu.be/cVVqe

Link to the code repository

https://github.com/HReynaud/dartagnan

Link to the dataset(s)

https://echonet.github.io/dynamic/

https://github.com/dccastro/Morpho-MNIST


Reviews

Review #3

  • Please describe the contribution of the paper

    The paper introduced an approach to generate counterfactual images. The approach built upon Deep Twin networks and proposed a new architecture. The model has been assed on two datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • well written
    • interesting problem of generating counterfactual image generation
    • evaluation on two datasets
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • no comparison with similar approach applied on general computer vision tasks, e.g. https://github.com/autonomousvision/counterfactual_generative_networks
    • not sure if we need all the theorem in Section 2. The authors might have been able to explain in a simpler language.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    They are going to release the code. So it should be reproducible. otherwise the paper does not contain all the data to reproduce the results.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    please check section 5.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I think the author could have compared with strong baseline by using methods proposed on general images and videos.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Somewhat Confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #4

  • Please describe the contribution of the paper

    D’ARTAGNAN (Deep ARtificial Twin-Architecture GeNerAtive Networks) answers the question what would this echocardiogram look like if the patient had a different ejection fraction?

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Practical strategy for the absence of true labels missing for synthetic data. They defined 3 rules that will allow them to broadcast their true labels to the generated videos. Key point here is they want to make counterfactual videos that are visually indistinguishable. Two public datasets used MorphoMNIST and EchoNet databases. They are open to releasing their code. Proposed technique seems to be noval. Claiming that this is the first time this approach has been explored for medical image analysis.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Novel technique is mentioned with SSIM score but no comparative analysis with any baseline is presented. Reference to the EchoNet Github repository is missing. Too many and unnecessary abbreviations such as Ultrasound to US. Related work either discuss simulators which are physical simulators and compute intensive or implementations of deep twin networks. No reference for cases where counterfactual queries are worked upon may be in domains other than medical imaging. Too much mathematical detail, for Definitions which are difficult to comprehend without much context. Overall paper is difficult to understand. More explanation for the Abduction-Action-Prediction solution (discussed in Preliminaries) may be useful. Variable names should be described in Fig: 1 for a quick overview. Fig. 1 can be drawn wrt. the application on hand instead of a generic one. Minimal or zero representation of models using figuratively, leads to a very lengthy and confusing description. Poor sectioning and sub sectioning with casual paper writing. Also English needs to be significantly improved.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    With the details in the paper it is hard to reproduce but they are planning to release the code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Novel technique is mentioned with SSIM score but no comparative analysis with any baseline is presented. Reference to the EchoNet Github repository is missing. Too many and unnecessary abbreviations such as Ultrasound to US. Related work either discuss simulators which are physical simulators and compute intensive or implementations of deep twin networks. No reference for cases where counterfactual queries are worked upon may be in domains other than medical imaging. Too much mathematical detail, for Definitions which are difficult to comprehend without much context. Overall paper is difficult to understand. More explanation for the Abduction-Action-Prediction solution (discussed in Preliminaries) may be useful. Variable names should be described in Fig: 1 for a quick overview. Fig. 1 can be drawn wrt. the application on hand instead of a generic one. Minimal or zero representation of models using figuratively, leads to a very lengthy and confusing description. Poor sectioning and sub sectioning with casual paper writing. Also English needs to be significantly improved.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Work seems interesting with a nice application.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Somewhat Confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #5

  • Please describe the contribution of the paper

    This paper focuses on computing counterfactual queries for echocardiograms. The authors proposed a method called D’ARTGNAN that combines deep neural networks, twin causal networks, and generative adversarial learning. The model is tested on a synthetic dataset and a real-world echocardiogram dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors propose a novel model.

    The experimental results look promising.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The detailed descriptions of the model are not clear.

    The authors do not sufficiently describe how it is related to existing work, and no quantitative comparison is given.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    I am not sure whether the MorphoMNIST dataset and Echonet-Dynamic dataset are publicly available. The given details are insufficient to reconstruct the proposed model. However, the authors promise to provide the code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The authors propose a novel model for counterfactual queries. However, it is not clear how it is compared to existing models. Is there a video generating model available? What is missing in the existing models for a similar purpose?

    I have difficulty in understanding the details of the model, and a better clarification might be necessary. In Section 2, what are the relationships between $U$ and $X$, and how $E$ and $Y$ are related to $V$? In Section 3, how is a twin network used to generate the counterfactual samples? What is the objective function? What is the random variable $U_y$ in Fig. 1? How is the DAG utilized in the framework? How is the propensity score defined and used in the model?

    The experimental results look promising. However, the authors might need to compare the proposed model with a baseline model, such as an existing model or an ablative version of the proposed model. Without comparison, it is not clear whether the metrics shown in Table 1 are good or not.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed model looks novel, and the experimental results look promising. However, the description of the model is not clear, and the proposed model is not compared with other models in the experiments. The paper might need major revision before it is published.

  • Number of papers in your stack

    6

  • What is the ranking of this paper in your review stack?

    5

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
    • The paper propose a method for a counterfactual image generation.
    • Reference and comparison with similar method is missing, here are just to name a few (https://github.com/autonomousvision/counterfactual_generative_networks) , (https://github.com/batmanlab/Explanation_by_Progressive_Exaggeration)
    • The authors must compare with baseline
    • There are concern about writing, please read the reviewers comment and address them all. For example one complain is “Minimal or zero representation of models using figuratively, leads to a very lengthy and confusing description”
    • There are missing references in the paper including EchoNet
  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7




Author Feedback

We thank the reviewers and AC for their time and effort. We are glad that all recognize the novelty and impact our work could have.

Baseline comparisons [AC, R3, R4, R5] We studied many works that ought to be related to ours, but none of them matches the inputs (video and continuous value) and outputs (video) of our approach, nor the method we use (Causal Deep Twin Networks). Causal approaches are rarely studied in Computer Vision literature and to the best of our knowledge we present the very first work that provides insights into counterfactual video generation with appropriate causal methods and show additionally that this can be useful for a very relevant problem in medical diagnosis with ultrasound imaging. The following few methods from the Computer Vision community use somewhat related settings:

  • conditional video generation from one image/video and a class [3, 1]
  • conditional image generation with a continuous value [2]
  • causal image generation from an image [5]
  • causal image generation from classes [4] The approach we present is clearly distinct from all these methods. First, it introduces the novel approach of causally enabled Deep Generative Twin Networks. Then, while other techniques do video to video manipulation, we are the first to do this with continuous values, e.g., clinical parameters like the Left Ventricular Ejection Fraction, to condition the model and we are the first to use medical video data. The difficulty to come up with a baseline does not only originate from the novelty of our method, but also from the novelty of the task. We deal with extremely noisy data and aim to generate similar data while respecting continuous input. To be able to produce an insightful baseline comparison, we modified our approach to do image-to-video generation. By doing so, we are able to compare parts of our approach with [3]. We ran experiments to get an idea of how our method performs compared to [3] on this task. DARTAGNAN obtained a superior R2 score of -0.05 and superior SSIM score of 0.72 while ImaGINator [3] obtained an R2 score of -0.20 and an SSIM score of 0.54. Both methods were evaluated in the same manner over the same data. These results will be included together with the discussion above in the camera ready version and additional visual examples will be added to the supplemental material.

Missing References [AC, R4] As per MICCAI guidelines, we include peer reviewed citations or pre-print citations that are necessary for correctly attributing notions we use in our work. The public Echonet Dynamic dataset is cited by its corresponding paper [22] and not with its GitHub repository.

Clarity of explanations [AC, R3, R4, R5] We will simplify the mathematical preliminaries of Sect. 2, so they are better understood by a wider audience. This will make space for the discussion above. We will adjust Fig. 1 to represent the application of DARTAGNAN to EchoNet rather than the general framework. We will also improve the overall structure of the paper and address the abbreviations as much as the space constraints allow us to.

[R4] We do reference other causal works that are close to our domain or linked to our approach at the end of the section (see papers 23, 20, 10, 32, 4 in the submission).

[R5] Both datasets used in the paper are public, this will be made explicit at the beginning of Sect. 4.

[R4,R5] a first version of our code is provided here: https://anonymous.4open.science/r/dartagnan-miccai/

[1] Tzaban, R., et al. Stitch it in Time: GAN-Based Facial Editing of Real Videos. arXiv:2201.08361. [2] Ding, X., et al. CcGAN: Continuous conditional generative adversarial networks for image generation. ICLR’20 [3] Wang, Y., et al. Imaginator: Conditional spatio-temporal GAN for video generation. WACV’20 [4] Sauer, A., & Geiger, A.. Counterfactual generative networks. arXiv:2101.06046. [5] Kocaoglu, M., et al.. Causalgan: Learning causal implicit generative models with adversarial training. arXiv:1709.02023.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
    • The rebuttal did not address comparison with a baseline method. The argument is that none of those method address exactly the same problem and dataset is quite noisy. I did not find it convincing. Should one apply/adapt previously developed method, minimum effort to create a baseline.

    • I have a hard time understand this reply “… DARTAGNAN obtained a superior R2 score of -0.05 … “. The R2 of zero (or negative) means no prediction. Yes, the others are worse but this shows no relationship.

    • I think this paper is promissing but not ready for pubication yet.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    na



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    After going through the reviewer comments and the author’s respond, I felt like the main concerns were addressed in the rebuttal with respect to quantitative comparison to other approaches (the authors provide additional results) and discussion of novelty (the authors now differentiate their approach well to other existing approaches). Overall, after the additional promised changes to the manuscript, the work is advanced enough to be presented at miccai. The overall topic of conterfactuals is interesting to the community and well suited for MICCAI.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    2



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper proposes a method for generating contrefactual echocardiograms. The proposed method combines deep neural networks, twin causal networks, and generative adversarial learning. The problem formulation and approach have been qualified by the reviewers as very original. I believe the problem and method might raise interesting beyond the specific applicative domain. The rebuttal gives somewhat reasonable arguments explaining the lack of a baseline comparison and provides additional results in a more restrictive setup to comply with this request. Unfortunately, the paper also requires a revision in terms of related work discussion, clarity and structure. Following the principle that conferences are for sharing novel ideas I support the acceptance, while putting it towards end of my ranked papers for all revisions it requires.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7



back to top