Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yimu Pan, Alison D. Gernand, Jeffery A. Goldstein, Leena Mithal, Delia Mwinyelle, James Z. Wang

Abstract

The standard placental examination helps identify adverse pregnancy outcomes but is not scalable since it requires hospital-level equipment and expert knowledge. Although the current supervised learning approaches in automatic placenta analysis improved the scalability, those approaches fall short on robustness and generalizability due to the scarcity of labeled training images. In this paper, we propose to use the vision-language contrastive learning (VLC) approach to address the data scarcity problem by incorporating the abundant pathology reports into the training data. Moreover, we address the feature suppression problem in the current VLC approaches to improve generalizability and robustness. The improvements enable us to use a shared image encoder across tasks to boost efficiency. Overall, our approach outperforms the strong baselines for fetal/maternal inflammatory response (FIR/MIR), chorioamnionitis, and sepsis risk classification tasks using the images from a professional photography equipment at a large urban academic hospital; it also achieves the highest inference robustness to iPad images for MIR and chorioamnionitis risk classification tasks. It is the first approach to show robustness to placenta images from a mobile platform that is accessible to low-resource communities.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16437-8_68

SharedIt: https://rdcu.be/cVRuT

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #2

  • Please describe the contribution of the paper

    The author improved current ConVIRT with NegLogCosh (Negative Logarithmic Hyperbolic Cosine ) and sub-feature comparison to address the feature suppression problem. Experiments verified the generalizability and robustness of their method on their placenta dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The authors introduce contrastive learning to reduce the data scarcity on placenta images.
    • The authors build up the placenta datasets with different modalities, which will benefit the community in this research area.
    • The authors address the feature suppression problem in VLC approaches to improve generalizability and robustness.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The comparied baselines is not enough and strong, i.e., the resnet50.
    • No experiments on public datasets. It would weaken the confidence in their improvements on the tasks of placenta image analysis.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • The reproducibility is challenging without the authors’ dataset and training hyperparameters.
    • The authors are suggested to open their datasets and algorithm that will positively impact placenta image analysis.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • Please use a table to improve the description clarity of the negative sample and positive sample in the dataset Section.
    • Please optimize the writing and figures.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors provide a pilot work on multi-modalities (image and medical reports) contrastive learning for placenta image analysis.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Somewhat Confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The main contributions are two-fold: 1. introduces a vision-language contrastive (VLC) framework for pretraining for placenta analysis. 2. introduces a new loss function for training the VLC model and the proposed loss function seems to give better results.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. I think the VLC framework used here is reasonably novel, particularly as it is applied to placenta analysis.
    2. The new loss function introduced is also novel, particularly in the context of the VLC task
    3. The results are strong – and the new loss function seems to give better results.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The new loss function is not adequately motivated. For example, why not simply use something like x instead of log(cosh(x))?
    2. There were no statistical tests performed to ascertain if the proposed model is significantly better than others. Also no confidence intervals are presented.
    3. The writing could be improved. There are a lot of notational inconsistencies as well as logical gaps (see detailed comments below)
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    likely reproducible

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. The notation is not defined. For example, in equation 1, what is t_{-1}? It is also not consistent – in equation 1, x_i is a vector without bold fonts, but later in equation 3, bold fonts are used for vectors and in equation 4, normal font v_i is a scaler. Please ensure consistency in notation.
    2. There are many statements that are not clear to me. Please rephrase these: a. “NegLogCosh have less emphasis when vi and ui are very different thus reducing the effect of dominant feature from either the text side or the image side.” – why is that? log(cosh(x)) looks like |x| when |x| is large. Not sure what you mean here. b. “Because the similarity metric (5) compares two feature vectors element-wise, the result of the comparison should not change much if some features are missing.” – why is that? Equation (5) ultimately takes the mean, which can is easily affected by outliers.
    3. there are several places the equations/logic is incorrect/has typographical errors: a. in equation 9, what is c? and how do you get (9) from (5)? b. How does equation (10) imply (11)? It only holds if equation 10 is true for every possible realization of (i_1, i_2,…i_k) (all 2^k of them!)
    4. Instead of heuristically assuming that 1 or 2 percentage point difference is significant, it may be better to perform statistical tests to verify significance.
    5. Instead of a computationally expensive new loss function log(cosh(x)), what would happen with a simpler loss function x ?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    i think the method is interesting, particularly with joint image-language contrastive learning. They also introduces a new loss function which seems to do better than the standard loss function on their dataset.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #5

  • Please describe the contribution of the paper

    The authors present a vision-language contrastive learning approach to classify placenta images. The method pretrain a generic image encoder using pathology reports and images. The contrastive-learning loss uses a negative logarithmic hyperbolic cosine similarity to avoid shortcut solutions in the model. The pre-training dataset consisted of 10K images and the fine tuning dataset of 2.8K images. The results show that the model outperforms the visual only baseline using resnet and the multimodal conVIRT models.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is easy to follow.
    • The state-of-the-art in the problem is well-mentionated and the proposed method is well justified.
    • The idea of stabilizing feature importance using the hyperbolic cosine is interesting and it deserves further exploration in various multimodal clinical scenarios.
    • The evaluation in placental pathologies highlights the model’s usefulness.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The results table only reports one absolute result for each method, and not an average over several runs, or a k-fold training. Making it difficult to evaluate the robustness of the method to initialization/ augmentations used during training.

    • Details and discussion of qualitative prediction results are lacking.

    • There is a baseline missing: Clinical text reports only.

    • It would have been interesting to show what are the more important areas for the classification using a saliency method such as Grad-cam (despite the well-known issues of it).

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors mention that the dataset comes from a large urban academic hospital but not mention it’s availability, possibly the dataset is private and therefore it will be difficult to reproduce the results using the same data.

    The authors do not mention availability of their source code making it difficult to even do the same evaluation in other datasets.

    In the supplementary material there is extra information of the training hyperparameters, this might help in the reimplementation of their method.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    +In the right side of Figure 2: “Stop gradient” is not clear what it refers to, is it not just that the output for the non-relevant tasks is not used (equation 2?), or you used a multi-task objective and stopped the gradient each time for the tasks that were not the relevant at a given optimization step? This should be explained better in the text .

    • In what images the multimodal model was better than the vision only? Can you devise a method to assign word importance to image regions?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method is interesting but the evaluation and results interesting could have been optimized (more runs, qualitative results, better discussion).

  • Number of papers in your stack

    6

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The authors introduces a vision-language contrastive framework for pretraining for placenta analysis. The method is novel and the results strong. All 3 reviewers agree that the paper merits acceptation. To further improve the paper, authors could take into consideration the reviewer’s comment and include some discussion of qualitative prediction results, which are currently lacking

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    NR




Author Feedback

N/A



back to top