Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Soumen Basu, Ashish Papanai, Mayank Gupta, Pankaj Gupta, Chetan Arora

Abstract

Automated detection of Gallbladder Cancer (GBC) from Ultrasound (US) images is an important problem, which has drawn increased interest from researchers. However, most of these works use difficult-to-acquire information such as bounding box annotations or additional US videos. In this paper, we focus on GBC detection using only image-level labels. Such annotation is usually available based on the diagnostic report of a patient, and do not require additional annotation effort from the physicians. However, our analysis reveals that it is difficult to train a standard image classification model for GBC detection. This is due to the low inter-class variance (a malignant region usually occupies only a small portion of a US image), high intra-class variance (due to the US sensor capturing a 2D slice of a 3D object leading to large viewpoint variations), and low training data availability. We posit that even when we have only the image level label, still formulating the problem as object detection (with bounding box output) helps a deep neural network (DNN) model focus on the relevant region of interest. Since no bounding box annotations is available for training, we pose the problem as weakly supervised object detection (WSOD). Motivated by the recent success of transformer models in object detection, we train one such model, DETR, using multi-instance-learning (MIL) with self-supervised instance selection to suit the WSOD task. Our proposed method demonstrates an improvement of AP and detection sensitivity over the SOTA transformer-based and CNN-based WSOD methods. Project page is at https://gbc-iitd.github.io/wsod-gbc.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43907-0_20

SharedIt: https://rdcu.be/dnwci

Link to the code repository

https://gbc-iitd.github.io/wsod-gbc

Link to the dataset(s)

https://gbc-iitd.github.io/gbcu


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presented a new deep-learning method for gall bladder cancer (GBC) detection in ultrasound (US) images. Specifically, the problem was posed as a weakly supervised object detection (WSOD) problem (with only image-level labels) and a detection model DETR [5] was trained using multi-instance-learning [23] approach. Experiments on an US dataset and also a colonoscopy dataset show the effectiveness of the proposed method. The main contribution of this paper is the addressed problem and how it was formulated as a WSOD problem.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The addressed problem is well-motivated and clinically useful.

    • The proposed method was shown to perform better than the compared previous methods under the same/similar settings.

    • The paper is generally well-written.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The claim of “free available” of the image-level labels from the diagnostic report is not correct, as the report was given by doctors that were experts in this domain. This is still a kind of label that is not “free”.

    • Given the related work [21, 19, 7, 23] and the detection model DETR used and also the weakly supervised object detection [26] pipeline, the technical novelty and contributions of this paper are a bit limited. The difference to these prior works and the main novelty of the proposed method is unclear.

    • Given the size/scale (1255 image samples) of the dataset used, it is unclear how the authors make sure the transformer model was not overfitting to the data. Same issue with the Polp dataset.

    • The authors used a COCO pre-trained DETR model, but the COCO data is natural images, while in this paper the data processed is medical data. There is a (huge) domain gap. It is unclear how this was addressed in the proposed method, especially for the Frozen DETR model.

    • Missing reference to DETR when it was first mentioned on the first 2 pages.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Given that the source code and models were provided with the submission, the reproducibility of the paper seems to be good.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • It would be better if the authors could have clearly clarified the key technical contributions and novelty, especially in comparison to those related works and techniques used (as a combination) in the paper.

    • The generalisation experiment of applying the proposed method to polyp detection on colonoscopy data is a bit confusing. Since the proposed method was designed for ultrasound data, it would be better to apply it to the same data type but a different task.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method achieves good results, but the technical design was not very clearly motivated and the novelty and technical contributions are also unclear. When compared to the methods used, it is hard to tell what is the difference and what new technique was introduced in the proposed method. There are also a few issues with the experiment, e.g. the dataset scale and the natural image pre-trained model, etc.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper proposed a solution to Gall Bladder Cancer detection problem. The proposed solution contains 3 main components: 1. DETR 2. Multiple Instance Learning 3. Self-supervised learning.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper combined several established techniques: DETR, MIL, SSL into a solution for GBC detection. The proposed method seems novel and could be useful.
    2. Evaluation is strong, compared to various SOTA methods, and on multiple datasets.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. while the proposed method is interesting, the performance is not strictly better than the compared baselines.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    1. How did you finetune the class-aware DETR branch? It is not clear to me from the paper. Could be useful to include more details such as what data, label, loss did you use.

    2. How does the number of trainable parameters compared to the other baselines in the paper. It is also important to consider size of the model when comparing different algorithm.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    see above

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper proposed a solution to Gall Bladder Cancer detection problem. The proposed method contains three key components:1. DETR, a previously proposed transformer-based architecture. 2. Multiple Instance Learning 3. Self-supervised learning. Although each components are all themselves not new and has demonstrated their successfulness and applicability in medical imaging problem. Combining these pieces to obtain a good solution for the Gall Bladder Cancer detection problem is a valuable contribution in my view. Further, the evaluation section is strong by comparing multiple methods on multiple datasets.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    i keep my score the same



Review #3

  • Please describe the contribution of the paper

    The paper presents a solution to the problem of automated detection of Gallbladder Cancer (GBC) from ultrasound (US) images using only image-level labels. The authors propose to formulate the problem as weakly supervised object detection (WSOD) using a transformer-based model (DETR), trained with multi-instance learning (MIL) and combined with self-supervised instance localization refinement.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The experimental results confirm the superiority of the utilized framework in gallbladder cancer classification compared to when an equal amount of labeled/unlabeled data is available. Although the alternative methods based on semi-supervised learning using additional unlabeled data have achieved better performance, this cannot be regarded as a weakness of WSOD. Indeed, this can open space for further investigations on the combination of this method with semi-supervised learning using unlabeled data.

    The paper is well-written and structured, and the content is easy to follow. The authors have also released the code anonymously, which is very much appreciated.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I have some major concerns regarding the novelty and experiments that I will list in the following.

    1) Novelty: Regardless of the results, the proposed method can be mainly seen as a combination of two previously proposed architectures: (1) detection transformer (DETR) trained with multiple instance learning (MIL) paradigms [19,21], and (2) self-supervised instance classifier refinement [23].

    2) Experiment: Since the instance classifier refinement is taken from [23], it cannot be regarded as a contribution of this paper. Accordingly, the experimental results in Table 3 and Figure 4 only reflect the gain that the self-supervised instance classifier refinement method [23] can achieve.

    3) Ablation Study: The proposed weakly supervised architecture consists of three components. However, the authors have not evaluated the contribution of these components independently using an ablation study setting (e.g., by removing SSL instance learning or class-agnostic DETR). Hence, it is difficult to judge if the improvement over some alternatives depends on all components.

    As a minor concern, there are a few typos that should be corrected (e.g., “a image classification” and equation (6)). Besides, in Section 1, the authors claim that: “The reliance of both SOTA techniques on additional annotations or data, limits their applicability.” However, this argument is not correct. Indeed, there are always abundant unlabeled data available for any problem including GBC videos, that can be effectively utilized using semi-supervised learning frameworks.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The network architecture explanation lacks some technical details. However, since the datasets used in this study are public, and the codes have already been made available, the results should be reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    I would suggest that the authors conduct an ablation study on the performance of their proposed architecture for both datasets. In addition, it would be great if the author delineate the contributions of their proposed method considering the three different components.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Although the paper lacks enough technical novelties, it is used for the first time for GBC detection to the best of my knowledge. Besides, the experimental results confirm the superiority of this method compared to some alternatives when the same amount of labeled and unlabeled data is available.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    The authors have justified the contribution of this paper, which is (1) combining detection transformer (DETR) trained with multiple instance learning (MIL) and instance classifier refinement for object localization using image-level labels, and (2) Applying this method for automatic detection of automated detection of Gallbladder Cancer (GBC). I suggest that the authors transparently state in the paper that the experimental results in Table 3 and Figure 4 reflect the gain that the self-supervised instance classifier refinement method [23] can achieve for GBC and Polyp datasets. Overall, the current study and the topic of weakly supervised learning are very relevant in the medical domain, where annotations require expertise and, accordingly, are time extensive and expensive.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper presented a new deep-learning method for gall bladder cancer (GBC) detection in ultrasound (US) images in a weak supervision manner. Experiments on an US dataset and also a colonoscopy dataset show the effectiveness of the proposed method. The three reviewers also affirmed this is an interesting work. The issues include adding the generalization experiment, clarifying the key technical contributions and novelty, and some other details mentioned by reviewers. Please address these concerns in the final version.




Author Feedback

We thank the reviewers and AC for their detailed feedback. We are motivated to know that they found our work interesting and clinically significant. We address the questions below.

R1, R3, AC: Clarification on Novelty? While DETR, MIL, SSL were proposed in the literature individually, our design is the first to integrate them into a novel end-to-end trainable transformer-based WSOD pipeline. Such a model has not been proposed for medical imaging tasks. Transformer+MIL has been used in classification [21] but not in WSOD. Also, SSL-based instance refinement is used in CNNs [23,26] but not in transformer-based WSOD. We claim novelty on the overall design and its application on detecting GBC.

R1, AC: Generalization experiment? We experiment on breast cancer on the US images using a public dataset - BUSI [A]. We show the (Accuracy,Specificity,Sensitivity,AP25) scores of our model vs three SOTA WSOD baselines. TSCAM:0.69, 0.76,0.46,0.03 WS-DETR:0.62,0.64,0.57,0.2 PBC:0.61,0.64,0.49,0.273 Ours:0.71,0.72,0.66,0.277 Our proposed WSOD model is not specific to US or GBC. Thus, we initially showed the generality of our method for two modalities - US and colonoscopy, and two diseases - GBC and Polyp. We’ll add the BUSI results in the final version if the ACs and reviewers agree. A. Dataset of breast ultrasound images. Data in brief, 2020.

R1: Pretrain on COCO? Frozen-branch? Pretraining on natural images to leverage generic features for downstream medical imaging tasks is a known practice [A,B]. Frozen branch is used to - 1) generate initial object queries by leveraging the generic object features, 2) convert embedded object queries to boxes using the FFN head. Both tasks do not require domain-specific knowledge. The unfrozen (class-aware) branch learns domain-specific objects. A. Towards a better understanding of transfer learning for medical imaging: a case study. Appl. Sci. 2020 B. Transfer learning with convolutional neural networks for classification of abdominal ultrasound images. J. digital imaging 2017

R1: Overfitting in a small dataset? Training vision transformers (VT) from scratch requires large data. But, fine-tuning pretrained VT on small datasets works at par with CNN [A]. Also, adding low-level CNN features makes VT models usable on small data [B,C]. The CNN backbone of our model, and regularization (data augmentation, weight decay), control overfitting. Indeed, the mean (Acc,Spec,Sens) of (0.952,0.957,0.94) of train splits of GBC are comparable to (0.894,0.897,0.879) of val splits. A. Is it Time to Replace CNNs with Transformers for Medical Images? ICCVW 2021 B. Efficient training of visual transformers with small datasets. NeurIPS 2021 C. Transformers meet small datasets. IEEE Access 2022

R2: Training loss, params? We used the loss defined in Eq 6. The image labels (disease vs. non-disease) are used for training WSOD models. Our public source code will supplement the implementation details in Sec 4. The no. of parameters - Ours:41.2M, TSCAM:21.7M, SCM:21.7M, OD-WSCL:151.4M, WS-DETR:41.2M, PBC:41.3M. Our model’s size is comparable to other WSOD models (except TSCAM/SCM) and the performance is superior to others.

R3: SSL is not Novelty We are not claiming novelty on SSL. The novelty is on the entire pipeline, and thus we hope the comparisons in Tab 3, Fig 4 with other WSOD models are justified.

R3: Ablation results Our model has 3 components - DETR, MIL, SSL. Vanilla DETR works for fully supervised object detection, not on WSOD. Thus, it is not shown in ablation. DETR+MIL and DETR+MIL+SSL are WSOD and are shown in Tab 4. Ablation (AP25,Sens) on polyp data: D+M:0.25,0.88 D+M+S:0.36,0.96

R1: “free” image labels Image labels are generated in diagnosis by the doctors, but annotating boxes require additional efforts. The image labels are free in this sense.

R3: Semi-supervised frameworks Indeed, such frameworks are also label efficient, but not the focus of this paper.

R1, R3: Reference, Typo We’ll correct these.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    In this paper, authors proposed a new weakly supervised method for gall bladder cancer (GBC) detection in ultrasound images. Specifically, authors formulates the problem as a weakly supervised object detection (WSOD) problem, where only image-level labels are available. Three main components are: a detection model DETR, MIL approach and self-supervised learning. The proposed has shown great performance compared to previous methods. However, several revivewers have raised concerns about the novelty of the proposed method. Meanwhile, the lack of ablation study to evaluate the contribution of main components. In the rebuttal, authors have provided additional results from ablation studies to address this concern. Authors explained in their rebuttal that the integration of DETR, MIL and SSL into and end-to-end trainable transformer-based WSOD framework for medical image detection is a novel contribution.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper studied the gallbladder cancer detection problem with image-level labels and proposes a multi-instance-learning (MIL) method with self-supervised instance selection. The method achieved performance close to that of the supervised method. The rebuttal provides additional details about the method and experiments. I agree with the concern of “free label”; labels from the report are not free; the report writing itself also requires doctors’ efforts.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper combined the existing techniques including DETR, MIL and self-supervised learning for GBC detection from ultrasound images. Although it’s interesting to see the application to GBC, I agree with R1 and R3 that the technical contribution is incremental and motivation of method design is not clear. Overall, I feel the weakness is slightly over the merits.



back to top