Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yen Nhi Truong Vu, Dan Guo, Ahmed Taha, Jason Su, Thomas Paul Matthews

Abstract

Deep-learning-based object detection methods show promise for improving screening mammography, but high rates of false positives can hinder their effectiveness in clinical practice. To reduce false positives, we identify three challenges: (1) unlike natural images, a malignant mammogram typically contains only one malignant finding; (2)~mammography exams contain two views of each breast, and both views ought to be considered to make a correct assessment; (3) most mammograms are negative and do not contain any findings. In this work, we tackle the three aforementioned challenges by: (1) leveraging Sparse R-CNN and showing that sparse detectors are more appropriate than dense detectors for mammography; (2) including a multi-view cross-attention module to synthesize information from different views; (3) incorporating multi-instance learning (MIL) to train with unannotated images and perform breast-level classification. The resulting model, M&M, is a Multi-view and Multi-instance learning system that can both localize malignant findings and provide breast-level predictions. We validate M&M’s detection and classification performance using five mammography datasets. In addition, we demonstrate the effectiveness of each proposed component through comprehensive ablation studies.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43904-9_75

SharedIt: https://rdcu.be/dnwIo

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes a multi-view and multi-instance learning sparse detector, called M&M, to tackle false positives in mammography screening. The authors identify three challenges in mammography screening and address them through the proposed approach. The contributions of the paper include showing the benefits of sparsity of proposals for mammogram analysis, incorporating a cross-view multi-head attention module for mammography analysis, and leveraging MIL to include images without bounding boxes during training. The resulting model can both localize malignant findings and provide breast-level predictions, and its detection and classification performance is validated using five mammography datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Novel formulation: The proposed approach is a multi-view and multi-instance learning sparse detector that addresses three challenges in mammography screening. The authors leverage Sparse R-CNN, a cross-view multi-head attention module, and MIL techniques to develop a model that can both localize malignant findings and provide breast-level predictions.

    2. Original use of data: The authors utilize OPTIMAM, a large dataset with a significant proportion of negatives, for training and evaluation. This dataset is more representative of clinically-relevant data than typical evaluation datasets, which contain few negative cases.

    3. Clinical feasibility: The proposed approach aims to tackle false positives in mammography screening, which is an important clinical problem. The authors validate the detection and classification performance of the proposed method using five mammography datasets.

    4. Strong evaluation: The authors demonstrate the effectiveness of each proposed component through comprehensive ablation studies. They also compare the performance of their method with previous works and show that it surpasses them by a large margin in the clinically-relevant region of less than 1 FP/image.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Limited comparison with state-of-the-art methods: While the authors compare their method with previous works, they only compare it with a limited number of state-of-the-art methods. A more comprehensive comparison with other recent approaches would provide a better understanding of the proposed method’s performance.

    2. Lack of explanation for some design choices: The authors do not provide a detailed explanation for some design choices, such as the choice of hyperparameters and the specific architecture of the cross-view multi-head attention module. More information on these design choices would help readers understand the proposed method better.

    3. Limited generalizability to other imaging modalities: The proposed approach is specifically designed for mammography screening and may not generalize well to other imaging modalities or clinical applications.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors provide detailed information on their proposed approach and evaluation methodology, which should make it possible for other researchers to reproduce their results.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. Provide more detail on potential limitations or confounding factors in this study. This could include discussing any biases in the data or methodology that could affect the results.

    2. Discuss how these findings fit into existing literature on this topic. This could help readers understand how this work builds upon previous research and contributes to our understanding of mammogram analysis using deep learning.

    3. Make code and data available for others to reproduce the results. This would improve the reproducibility of the study and allow other researchers to build upon this work.

    4. Consider providing additional definitions or explanations for technical terms or jargon. This would make the paper more accessible to readers outside of this field.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Originality and significance of the research question

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    Please see the detailed comments

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Please see the detailed comments

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Please see the detailed comments

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Maybe

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Strengths:

    • The topic is timely.
    • The paper is easy to follow and well-written.
    • The proposed approach seems sound, but more appropriate evaluations and performance comparisons are needed.

    Weaknesses:

    • The authors missed two outstanding curves in Fig. 1 in recent years: [31], and the following paper (the extended version of [13]): ** Liu, Y., Zhang, F., Chen, C., Wang, S., Wang, Y., & Yu, Y. (2021). Act like a radiologist: towards reliable multi-view correspondence reasoning for mammogram mass detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10), 5947-5961. They report their FPPIs at R@0.5 are 0.831 and 0.82, respectively. The authors should include them because they provide very competitive results on DDSM. Therefore, without appropriately including those lines in Fig. 1, it is hard to evaluate the superiority of this paper.

    • Moreover, the authors of [31] provide the detailed file lists of DDSM they used (https://www.researchgate.net/publication/351638497_Usage_Description_of_the_Public_DDSM_Dataset_in_MommiNet_Mammographic_Multi-View_Mass_Identification_Networks). To make a very fair comparison, the authors are encouraged to use the same partition strategy. If it is hard to use the same strategy, at least the authors should explicitly list the data splits of all competing methods. Furthermore, the partition strategies of [3, 5, 13, 16] seem not identical, please refer to their original papers for detailed information.

    I am open to increasing my rate if the rebuttal is satisfying.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    fair comparison and appropriate evalaution.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper presents a method to detect breast cancer in mammograms that makes use of unlabeled images, multiple instance learning and can reason about different views. The method is developed on five mammography datasets, two of which are proprietary and shown to outperform related work, especially on datasets with a large number of negatives.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    In general this is a well written and well structured paper providing an interesting new approach. The problem the authors address seems relevant as several academic papers that are published for this problem only use positive images.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Some things seem overly complex and it may be good to also include one or two very simple baselines. For example, the winning submission to mammography DREAM challenge that was held a few years back can also be trained with negative images and weak labels.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    In general, the method and experiments are well described, but some details about the in-house data are missing. It would be good to add that still.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • Please provide more details on the in-house datasets. What was the patient population, manufacturer, how was the ground truth determined, etc.

    • Although I am aware that DDSM and InBreast are some of the few publically available mammgoraphy datasets. These datasets are really old and results should be interpreted with care as they will not be representative of how a model would perform in practice. It may be good to add a note about this in the paper.

    • Please provide confidence bounds and statistical analysis of your results.

    • I think the paper can be further improved by adding a comparison to a simple baseline. The methods is interesting, but looks quite complex and it is difficult to see from this paper if all that complexity is really needed. This may make the transition to clinical practice more difficult.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I think this is an interesting paper that tackles a relevant problem. The paper is well written and structured.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Strengths: considerable technical novelty in leveraging Sparse R-CNN and a cross-view multi-head attention module; solid experiments with 5 datasets and comparisons to related methods and with meaningful ablation studies; target an important clinical question (reduce false positives)

    Weaknesses: the method seems overly complicated and lacks of simple baseline methods for comparisons; comparison with state-of-the-art methods is good but insufficient; missing some important evidence/numbers in Figure 1 (that affects evaluation of superiority of your method) as a review pointed out; used different data splits than popularly used for the DDSM dataset; Lack of explanation for some design choices of the method; statistical analysis missing in results;

    Points to address in rebuttal: Please respond to critical weaknesses from all reviewers and make sure below questions are included: Provide explanation/motivation for some design choices; more details (old/new, etc.) of the in-house datasets; code and dataset open availability; potential influence of the missing evidence/numbers in Figure 1.




Author Feedback

We thank reviewers for their time and feedback. We are encouraged R1 and R3 rank our paper 1st in their stack. R2 lists only concerns about Fig. 1. We respond in detail below.

[SOTA Comparison] R1 worries about limited SOTA comparisons. Fig. 1 provides 3 detection SOTA [13,16,32]; Tab. 3 provides 3 classification SOTA [14,26,30]. All SOTA are recent (2020-22), with strongest results from 2022 [32,14]. In addition, we will follow R2’s suggestion and add [,31] to Fig. 1. M&M achieves 87% recall at 0.5 false positives/image (R@0.5), outperforming both [] (82%) and [31] (83%).

R2 worries Fig. 1 baselines use different DDSM splits. M&M uses the same split with 5/8 works [3,13,16,32,], 4 of which are SOTA from 2020-22 [13,16,32,]. Reviewers can check Sec 2.5 of [3], Sec 3.1 of [16], Sec 4.2 of [13,32] and Sec 5.2 of [] to verify that we use the same 512 test cases. We will thus add the following note in the paper: “M&M adopts the same dataset splits used by [3,13,16,32,], while [5,20,31] use other splits. M&M (87% R@0.5) outperforms all recent SOTA with the same test split, including new 2022 SOTA [32] (83% R@0.5) by at least 4%.” [*] Liu et al, Act like a radiologist (2021)

[Design choices] R3 worries about M&M’s complexity. M&M has 3 components; each solves a unique challenge (Par. 2-3, Sec 1) towards reducing false positives (FP). We show in Sec 3 that each component is effective:

  1. Sparse R-CNN solves the challenge of finding sparsity in mammography. Sparse-RCNN alone outperforms dense detectors by 10% R@0.1 (Tab. 2).
  2. Multi-view module solves the challenge of analyzing finding’s appearance across 2 views of a breast. This module alone increases R@0.1 by 9% (Fig. 4)
  3. Multi-instance learning (MIL) solves the challenge of using images without finding annotations to train a detector, thus reducing distribution shift between training and clinical practice. MIL alone increases R@0.1 by 13% (Fig. 4) If complexity is a concern, readers can choose to adopt any 1 component of M&M, which will still result in large improvements (Fig. 4).

R1 wants more explanation for multi-view and hyperparameter choices. Multi-view motivation: a finding may look suspicious in one view but not the other, so both views must be taken into account when making a prediction (Par. 2, Sec 1). We use an attention layer for multi-view reasoning because it fits nicely with the self-attention layer in Sparse R-CNN: the self-attention layer mixes information for proposals in the same view, while our cross-attention layer mixes information for proposals across 2 views (Par. 2, Sec 2.2). Hyperparameters: Appendix Tab. A1-2 present studies on M&M’s hyperparameters (number of proposals and MIL methods). We follow Sparse R-CNN for other hyperparameters (e.g. optimizer, scheduler).

[Simple baselines] R3 would like simple baselines. Tab. 2 provides 5 common simple detector baselines with public implementations, including Faster R-CNN, RetinaNet, FCOS. R3 also suggests the DREAM challenge winning method. However, this challenge focuses on classification, while our paper focuses on detection. We have some classification results in Tab. 3 with baselines GMIC [22] and HCT [24]. Both [22,24] use the patch to image training pipeline, similar to the DREAM winning method. M&M outperforms both.

[Generalizability to other modalities] M&M method is relevant as we tackle the common medical problems of high FPs and limited findings (Par. 4, Sec 4).

[Code availability] (1) Sparse R-CNN code is publicly available. (2) Multi-view code is provided in appendix Alg. 1. (3) MIL module simply calls image_scores=NoisyOR(proposal_scores), breast_scores=Mean(image_scores) and applies cross-entropy loss given image and breast labels.

[Inhouse datasets] Our inhouse datasets are recent, with exams obtained between 2008-2019. After the double blind process, we will cite a paper of ours that has detailed description on patient population, ground truthing, etc.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    It seems the rebuttal addressed most of the concerns, including for one reviewer who gave a score of 4. Overall the technical novelty is fine and it addresses a clinically important question.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    After reviewing the submission itself, authors’ feedback and reviewers’ comments, AC agree with the assessment of “Novel formulation/technical contributions” and “strong quantitative evaluation & clinical indications”.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Although the authors provided their rebuttal, none of the reviewers changed their original scores. However, the final score is still among the ones on the higher-side in my pool.



back to top