Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Xin Wang, Tao Tan, Yuan Gao, Luyi Han, Tianyu Zhang, Chunyao Lu, Regina Beets-Tan, Ruisheng Su, Ritse Mann

Abstract

Asymmetry is a crucial characteristic of bilateral mammograms (Bi-MG) when abnormalities are developing. It is widely utilized by radiologists for diagnosis. The question of “what the symmetrical Bi-MG would look like when the asymmetrical abnormalities have been removed ?” has been not yet drawn strong attention in the development of algorithms on mammograms. Addressing this question could provide valuable insight into mammographic anatomy and aid in diagnostic interpretation. Hence, we propose a novel framework, DisAsymNet, which utilizes asymmetrical abnormality transformer guided self-adversarial learning for disentangling abnormalities and symmetric Bi-MG. At the same time, our proposed method is partially guided by randomly synthesized abnormalities. We conduct experiments on three public and one in-house dataset, and demonstrate that our method outperforms existing methods in abnormality classification, segmentation, and localization tasks. Additionally, reconstructed normal mammograms can provide insights toward better interpretable visual cues for clinical diagnosis. The code will be accessible to the public.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43990-2_6

SharedIt: https://rdcu.be/dnwLg

Link to the code repository

https://github.com/xinwangxinwang/DisAsymNet

Link to the dataset(s)

https://pubmed.ncbi.nlm.nih.gov/22078258/

https://vindr.ai/datasets/mammo

http://www.eng.usf.edu/cvprg/Mammography/Database.html


Reviews

Review #4

  • Please describe the contribution of the paper

    A method for bilateral mammogram analysis that disentangles lesions from bilateral asymmetries by synthesizing the normal, disease-free contralateral image in a self-adversarial fashion. Experimental comparison was conducted on INBreast, DDSM and VinDr-Mammo, as well as private in-house dataset, on both classification and segmentation task.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The proposed method increases performance on lesion classification and segmentation, and produces an additional output (what would the contralateral image look like if the breasts were symmetrical?) that could be inspected by the radiologists to further understand how the model operates
    • Extensive experiments, reporting confidence intervals, are conducted on several datasets and tasks. Ablations studies evaluate the impact of the additional synthesis component
    • The proposed methodology includes both self-attention and cross-view attention, whereas many previous architectures for mammography only employed cross-view attention.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • It is not clear what asymmetric classification is. To the best of my knowledge, DDSM and other datasets do not provide an asymmetry score, and asymmetries are often not annotated. From the methodology, it appears as it was defined as both sides being normal, but this definition is not consistent with BI-RADS (breasts could be asymmetrical even in the absence of cancer, assuming that abnormal refers to the presence of cancer, and a case of bilateral cancer would definitely be asymmetric). I understand that it may be beneficial to include this additional label during training, but I am skeptical about using it for validation purposes.
    • The methodology and figure could be improved in terms of clarity and readability.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors plan to make their code public which will certainly help in reproducibility given the complexity of the architecture. The description of the methodology should be sufficient to reproduce the results. Additional details are needed to clarify how the labels were defined during evaluation, as detailed in the comments for the authors. Experiments are conducted on both in-house and public datasets.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The contribution of the paper is strong and the experimental validation appears extensive. There are however a few elements that, probably due to the compressed format, were not particularly clear:

    • Figures and Tables have very small fonts, readability could be improved. There is a typo in Figure 2 (Trainging data)
    • Figure 1 implies that first the AsyC and AsyD modules are trained end-to-end, and the the AsyC module is refined while dropping the AsyD. Is that correct?
    • Were experiments conducted on CC and MLO separately?
    • I would focus on Abnormal classification in Table 1 and clarify whether abnormal include only cancer or also benign lesions.
    • How were confidence interval determined?
    • How were competing methods evaluated? Based on pretrained methods, available code or were they reimplemented by the authors?
    • It could be beneficial to better specify how the AsyC ablation study was conducted. If I understood correctly, the cross-view and self-attention transformer module is removed, and thus the features from the left and right view are simply concatenated and passed to the classifier. I am unsure, however, that my interpretation is correct.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors approach the problem of multi-view mammography classification with an original approach. Experiments are extensive and cover multiple datasets, tasks and ablation studies. Minor weaknesses found in the evaluation are unlikely to impact the results.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #5

  • Please describe the contribution of the paper

    The paper focuses on a clinically relevant phenomenon of breast asymmetries, that is a useful tool for breast radiologists when looking for breast cancers. The authors propose an approach that is intended to capture the asymmetry of two mammography images of two breasts. As an additional feature, the authors propose a method to disentangle the abnormality to generate a pair of normal appearances.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The addressed task is clinically relevant. The proposed method has the ambition to contribute to the explainability, as it reproduces the way breast radiologists work with images. The design of the method is well presented and several datasets are being used for evaluation. Generally, the paper is clear and well-written. The figures are illustrative and contribute well to the understanding of the philosophy of the proposed method.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    While being rather appealing from the explainability standpoint, the main task of breast cancer detection feels forgotten. The data is separated into normal (BIRADS1) and abnormal (BIRADS>1) samples. However, this is clinically less relevant as BIRADS2 abnormalities are common and might not be of interest to clinicians. The evaluation of the method feels detached from the state of the art (https://arxiv.org/abs/2108.04800), focused on mammography classification and detection tasks intended to capture abnormalities. The use of custom splits of the datasets may leave the knowledgeable reader puzzled about the overall performance of the method.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The experimental setting section contains a reasonable amount of details about the training strategy. However, the datasets splits are not clear, in particular considering that the DDSM and VTB datasets have official train/test splits and INBreast may benefit from https://arxiv.org/abs/2108.04800.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    I would like to thank the authors for their work. The paper is rather pleasant to read, but some parts may require clarification. The authors address the task of separating the BIRADS1 cases from BIRADS!=1 cases. As this is quite different from the usual state-of-the-art approaches (birads123 vs birads45), I would suggest the authors a better clarification of the clinical motivation of their approach. Moreover, some putting in the light of the state of the art (https://arxiv.org/abs/2108.04800) would help understand the contribution. The authors claim that no pixel-wise annotations are needed for the method, while the artifact synthesizer is relying on real samples from DDSM and INBreast. I suggest the authors retract the claim. The proposed Experimental setup introduces ResNet18 processing the images of 1024x512. I suggest the authors add some details as this is lower than the current state-of-the-art methods working with ResNet22, or ResNet50, and images of 2048 pixel height at least. The validation of the detection is a bit puzzling since the usual Free Receiver-Operator Curve (FROC) metric is not used. It makes the comparison to the state of the art more difficult. Finally while clear, the paper might need proofreading as a few sentences might be rephrased. - In 1, “Disentanglement learning [9, 16] with the aid of synthetic images…” is not clear - In 2, “The overview framework of our Dis-AsymNet is illustrated in Fig 1.”, the words might be misplaced

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper has potential with regard to its clarity and philosophy. However, there are some lacks with the overall design such as the choice of the BIRADS categories, or the validation pipeline (used metrics, lack of the state-of-the-art discussion), making the paper hard to accept.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    Please see the detailed comments.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Please see the detailed comments.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Please see the detailed comments.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Maybe

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Strengths:

    • The paper aims to address the issue of asymmetrical lesions in mammograms, which is a timely and interesting topic.
    • The paper is well-written and easy to follow.

    Weaknesses:

    • The examples provided in the figures are relatively simple cases of asymmetrical abnormality. The author should clarify whether asymmetrical cases include lesions such as unilateral masses, calcifications or structural distortion.
    • There is a significant visual difference between the synthesized images (in Fig. 1) and real asymmetrical images. The synthesized images are easier to be recognized.
    • The author should provide some examples of poor prediction results.
    • AUC is not directly related to the usability of the proposed approach in clinical practice. In actual clinical applications, a model needs an operating point where overall performance based on an AUC value is not considered. The author should provide comments that are relevant to clinical use, to distinguish your proposed approach from many models that are “all sizzle and no steak”.
    • Is the in-house dataset multi-center-based and/or multi-equipment-based?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The potential and value of the proposed method in practical scenarios.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Strengths: Considerable technical novelty by including both self-attention and cross-view attention, whereas many previous architectures for mammography only employed cross-view attention; extensive experiments of multiple datasets and meaningful ablation studies; has the potential to contribute to the explainability, as it mimics the way breast radiologists work with images;

    Weaknesses: the classification task (BIRADS 1 VS BIRADS!=1) is not or less relevant to the clinic because that is not what radiologists are typically looking for diagnosis – this significantly reduces the contribution of the work; custom splits of the data render the results not directly comparable with other results; concerns on your method for cases with unilateral lesions; should put comparison with some of the SOTA work into context.




Author Feedback

We greatly appreciate the AC and reviewers for the effort and insightful comments regarding our submission. We are encouraged by the reviewers’ positive feedback in terms of technical novelty, workload, study design, and writing. We will further correct errors and clarify all the concerns of the reviewers in the final version. We respond to the major points raised by the reviewers as follows.

Q1. The definition of asymmetric classification. A1. In this study, the “asymmetric” refers to the visual differences that can arise between the left and right breasts due to any abnormality, including both benign and malignant lesions. We acknowledge the concerns raised by the reviewers regarding the clinical relevance of our classification task. We will revise the paper to clarify the motivation behind our approach and its potential clinical applications. We will also cite state-of-the-art (SOTA) methods (https://arxiv.org/abs/2108.04800) and clarify how our research task differs from others. In this study, we selected competing methods that provided codes and then trained their model on our collected datasets from scratch for a fair comparison. In future work, we will focus on breast cancer detection tasks that are of higher clinical significance, aligning more closely with the interests of clinicians. Then we can also compare with existing SOTA approaches (which are more focused on identifying malignant tumors) as suggested by reviewers.

Q2. Methodology Clarity and Readability: A2. We express our sincere appreciation for the reviewers’ meticulous feedback on the clarity and readability of our methodology and figures. In the final version of our manuscript, we will address their concerns by providing a more explicit explanation of asymmetry, elucidating the visual differences between synthesized and real images, and enhancing the clarity of our figures and tables. Furthermore, we will provide a more detailed explanation of the training process and ablation study. It is important to note that our experiments were conducted jointly on both the craniocaudal (CC) and mediolateral oblique (MLO) views. The 95% confidence interval (CI) of AUC matric was estimated using 1,000 bootstraps for each measure. Regarding the claim of not requiring pixel-wise annotations, we will clarify the reliance on real samples for artifact synthesis. We will ensure the paper undergoes thorough proofreading to improve the overall quality of the manuscript.

Q3. Data Splits and Comparability: A3. We appreciate the reviewers’ concerns regarding the custom data splits. We understand the importance of standardization to ensure the comparability of results. However, the classification task in our study is different from the SOTA methods, which are more focused on malignant classification. As a result, direct comparison with these methods is challenging. Though we won’t be able to modify the data splits for this particular paper, in future work, we will employ consistent data splits to facilitate direct comparison with other studies.



back to top