Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Soumen Basu, Somanshu Singla, Mayank Gupta, Pratyaksha Rana, Pankaj Gupta, Chetan Arora

Abstract

Rich temporal information and variations in viewpoints make video data an attractive choice for learning image representations using unsupervised contrastive learning (UCL) techniques. State-of-the-art (SOTA) contrastive learning techniques consider frames within a video as positives in the embedding space, whereas the frames from other videos are considered negatives. We observe that unlike multiple views of an object in natural scene videos, an Ultrasound (US) video captures different 2D slices of an organ. Hence, there is almost no similarity between the temporally distant frames of even the same US video. In this paper, we propose to instead utilize such frames as hard negatives. We advocate mining both intra-video and cross-video negatives in a hardness-sensitive negative mining curriculum in a UCL framework to learn rich image representations. We deploy our framework to learn the representations of Gallbladder (GB) malignancy from US videos. We also construct the first large-scale US video dataset containing 64 videos and 15,800 frames for learning GB representations. We show that the standard ResNet50 backbone trained with our framework improves the accuracy of models pretrained with SOTA UCL techniques as well as supervised pretrained models on ImageNet for the GB malignancy detection task by 2-6%. We further validate the generalizability of our method on a publicly available lung US image dataset of COVID-19 pathologies and show an improvement of 1.5% compared to SOTA. Source code, dataset, and models are available at https://gbc-iitd.github.io/usucl.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16440-8_41

SharedIt: https://rdcu.be/cVRws

Link to the code repository

https://gbc-iitd.github.io/usucl

Link to the dataset(s)

https://gbc-iitd.github.io/data/gbcu

https://gbc-iitd.github.io/data/gbusv

https://github.com/983632847/USCL


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents a contrastive learning based method for USG videos, mining both intra-video and cross-video negatives in a hardness-sensitive negative mining curriculum. The effectiveness of proposed strategy is evaluated on two downstream tasks: GB malignancy classification and COVID detection. And the paper construct a large-scale USG video dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is well-motivated with good writing. They propose a novel hard negative mining strategy from the prior knowledge of USG videos, which is quite reasonable.

    • The paper release a large-scale USG video dataset.

    • The proposed contrastive learning method suppresses both ImageNet and SOTA contrastive learning based method.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The sampling of positive and negative frames involves many hyper-parameters, which may have a significant impact on the results. The paper should explain how the parameters are selected.

    • The proposed method resembling the classical idea of CPC[1] and DPC[2], which should be added in the related work.

    • Some details: the format is not allowed, as the authors have used a lot \vspace and parallel table.

    [1] Contrastive predictive coding for video representation [2] Video Representation Learning by Dense Predictive Coding

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Easy to reproduce.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Clarify how hyper-parameters are chosen and the impact on pre-training.

    Add related work mentioned before

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Quality-wise, the paper is well-motivated, propose an effective method and demonstrate a good performance. And the paper release a dataset.

    However, according to the guideline, using \vspace in the paper is strictly prohibited, accepting it would seem unfair to the other paper submissions, I’m happy to change the score, if ACs agree that paper format is not a big deal.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #2

  • Please describe the contribution of the paper

    The paper includes two main contributions. One is a USG video dataset of 64 videos and 15800 frames; based on the dataset, an unsupervised representation learning method is proposed, with a simple but insightful hard example mining mechanism.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is clearly written, and easy to understand.

    The dataset is a contribution to the community.

    The intra-video hard example mining mechanism is interesting and looks effective.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    My major concern is regarding the generalizibility of the algorithm. It is not clear to me whether the method can work in other medical modalities like 3D CT images. I will suggest to refer to the paper [1] for more insights on this problem.

    [1] Contrastive learning of global and local features for medical image segmentation with limited annotations, NeurIPS 2021.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Code and pretrained models are provided, thus the work is of high reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The work is based on an observation that in USG videos, there are no similarity between the temporally distant frames. However, I feel that the property also holds for many other medical images. Thus, I expect that the authors can provide an in-depth discussions regarding the generalisation of the method.

    The investigation of the cross-video sampling is very limited. How is the ‘n’ determined? Is it better to combine the negatives by their ranking weights against just contrasting all the negatives with equal weights? How is the memory bank maintained, e.g., as a queue? How will the memory size affect the performance?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper introduces a novel dataset and method for unsupervised representation learning from USG videos. The code is provided, making it reproducible. Even though there are some missing empirical studies, they are not related with the main motivation and can be easily studied. Thus, I will recommend the paper for acceptance.

  • Number of papers in your stack

    3

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    7

  • [Post rebuttal] Please justify your decision

    After reading the rebuttal and other reviewers’ comments, I believe that the paper has clear merits and changeed the score to “strong accept”. However, as denoted by R1, the format of the paper does not follow the guideline, in which “using \vspace is strictly prohibited”. I am not sure whether this will be an important issue in the current stage, but if it is, please feel free to lower weight my rating.



Review #3

  • Please describe the contribution of the paper

    The authors propose representation learning from Ultrasound Videos with contrastive learning. They propose a hardness-sensitive negative mining curriculum from both intra-video and cross-video negatives. The authors also contribute USG image datasets related to GB malignancy.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. I think the idea of hardness-sensitive negative mining curriculum learning is novel.
    2. The authors performs evaluation against a wide range of baseline methods, 10-fold cross-validation is reported.
    3. Ablation study is performed to demonstrate the contribution of proposed components.
    4. The USG image datasets related to GB malignancy can be useful for future study.
    5. Expert radiologists are involved in this study.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The Butterfly video dataset only contains 1533 images, which is small for contrastive learning.
    2. Some details are missing. e.g. what is the size of the memory queue.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors release their code and dataset.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. The authors could use some embedding visualization methods (e.g. PCA, tSNE) to see if malignant images are separated from normal images.
    2. They could try their method on larger dataset.
    3. They could specify some details like the size of the memory queue.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I think the proposed method is novel. The experimental setting is sound. The released USG image datasets related to GB malignancy can be useful for future study. Lastly, expert radiologists are involved in this study.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The reviewers are consistent in criticizing the experiment details. Your main contributions seem to be (1) offering a larger dataset and (2) selecting negative samples. The reviewer has concerns about the generality of your methods to different datasets: Is selecting negative samples ad hoc? How sensitive is it to the hyperparameters? How does your work fundamentally contribute to the methodology of representation learning? You seem to plan to publicly release the dataset and associated code; without publicly releasing the dataset and associated code, the value of your work would be significantly diminished.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4




Author Feedback

We thank the reviewers and the AC for their detailed feedback. We are motivated that the reviewers found our idea of using hardness-aware intra-video and cross-video negative mining curriculum novel. We reaffirm that the dataset and the source code will be publicly released. A common question regarding the choice of hyper-parameters for sampling was raised. We clarify that the hyper-parameter selection was not ad-hoc but based on rigorous experiments. We also address other comments below.

R1, R2, AC: How were sampling hyper-parameters selected?

  • We used a grid-search strategy to select hyper-parameters like the number of intra-video negatives (k), number of top cross-video negatives (n), and memory (implemented as a queue) size ( N ). Based on experiments, we chose (k=3,n=4, N =96) for gallbladder cancer (GBC) and (k=3,n=2, N =66) for POCUS (lung Covid). Below are some sensitivity experiments. We show cross-val mean (accuracy, specificity, sensitivity) for GBC and cross-val mean accuracy (overall, Covid, pneumonia, regular) for POCUS.

1) Varying ‘n’ (GBC) n=2: 0.887,0.885,0.904 n=3: 0.882,0.872,0.916 n=4: 0.921,0.926,0.9 n=6: 0.895,0.905,0.872 All -ves with equal weights: 0.9,0.915,0.852 (POCUS) n=2: 0.922,0.892,0.951,0.931 n=3: 0.906,0.840,0.948,0.933 n=4: 0.916,0.872,0.923,0.939 n=6: 0.911,0.878,0.903,0.933 All -ves with equal weights: 0.909,0.827,0.951,0.944

2) Queue size (GBC) 32: 0.898,0.904,0.889 64: 0.912,0.918,0.886 96: 0.921,0.926,0.9 128:0.911,0.913,0.898 (POCUS) 22: 0.907,0.846,0.917,0.941 44: 0.907,0.841,0.957,0.931 66: 0.916,0.872,0.923,0.939 110:0.912,0.861,0.948,0.931

3) Varying ‘k’ (GBC) k=2: 0.891,0.895,0.877 k=3: 0.921,0.926,0.9 k=4: 0.861,0.859,0.872 k=6: 0.909,0.924,0.859 (POCUS) k=2: 0.899,0.856,0.934,0.913 k=3: 0.916,0.872,0.923,0.939 k=4: 0.907,0.850,0.931,0.933 k=6: 0.908,0.918,0.897,0.906

We excluded these experiments from the paper as we felt these might distract the reader from the main contributions. We agree with the reviewers that including these details will strengthen the paper, and thank them for the suggestion. We will be happy to add these details to the supplementary if the ACs and reviewers agree.

R2, AC: Concerns regarding generality of the method?

  • We have shown our method’s efficacy over ImageNet pretraining and SOTA contrastive pretraining methods on two different ultrasound (US) datasets - (1) GBC from the abdominal US and (2) Covid detection from lung US, which establishes the generality of our method on US modality. Interestingly, R2 points out that our observation of temporally distant frames of US videos being dissimilar could hold for 3D CT as well. We analyzed the performance of a ResNet50 classifier in detecting Covid from a public CT dataset [1]. We pretrained the model using our method on another 3D CT dataset [2]. The (accuracy, specificity, sensitivity) is shown below for different methods. ImageNet pretrain: 0.73,0.72,0.74 USCL: 0.78,0.81,0.76 Ours: 0.80,0.81,0.80 Though we did not make any claim for modalities other than US, this experiment indicates the generality of our method across modalities.

[1] Yang et al: Covid-ct-dataset, arXiv:2003.13865 (2020) [2] Afshar et al: Covid-ct-md, Nature Scientific Data (2021)

AC: Fundamental contribution?

  • Representation learning in medical imaging is difficult due to the small datasets and large inter-patient variability. We use unlabeled video to tackle such issues. Differing from the existing literature, our method uses both intra and cross-video samples in a hardness-aware curriculum to learn effective representation for the downstream task.

R1: Use of \vspace?

  • Thanks for pointing this out. We carefully examined the tex source and found one \vspace before the introduction section, which is now removed. It was an inadvertent error, and we confirm that even after removing it, the paper fits within the stipulated 8-pages.

R3: Add tSNE plots?

  • Thanks for the suggestion. We can add tSNE plots in the supplementary.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors did weill in rebustal

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The low score seems to be motivated primarily by the problem with spacing in the Latex template. The other main concerns over hyperparameter selection appear to have been addressed in the rebuttal. The proposed hard-negative mining approach for US images appears to be simple but effective.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    As agreed by all the reviewers, there are clear merits in this paper including methodology, experimental performance improvements and promise on the release of codes as well as the ultrasound datasets, I am happy to vote for acceptance. In the final version, the sensitivity as well as selection strategy of hyperparameter and generalization of the proposed method should be discussed.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    1



back to top