Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Changkai Ji, Changde Du, Qing Zhang, Sheng Wang, Chong Ma, Jiaming Xie, Yan Zhou, Huiguang He, Dinggang Shen

Abstract

Breast cancer diagnosis is a challenging task. Recently, the application of deep learning techniques to breast cancer diagnosis has become a popular trend. However, the effectiveness of deep neural networks is often limited by the lack of interpretability and the need for significant amount of manual annotations. To address these issues, we present a novel approach by leveraging both gaze data and multi-view data for mammogram classification. The gaze data of the radiologist serves as a low-cost and simple form of coarse annotation, which can provide rough localizations of lesions. We also develop a pyramid loss better fitting to the gaze-supervised process. Moreover, considering many studies overlooking interactive information relevant to diagnosis, we accordingly utilize transformer-based attention in our network to mutualize multi-view pathological information, and further employ a bidirectional fusion learning (BFL) to more effectively fuse multi-view information. Experimental results demonstrate that our proposed model significantly improves both mammogram classification performance and interpretability through incorporation of gaze data and cross-view interactive information.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43990-2_7

SharedIt: https://rdcu.be/dnwLh

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #3

Please describe the contribution of the paper

This paper proposes a method to integrate gaze data as a form of weak supervision. With gaze data, the proposed Mammo-net learns a transformer-based attention for multi-view information mutulization and further uses BFL to integrate task-related information for final classification.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The pyramid loss based attention consistency module to learn cross view attention from gaze data seems novel.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

1) The paper is not very well organized. From the title and introduction part, it seems that the proposed method is for benign/malignant classification. But in Table 1, the comparison result on DDSM is for mass detection, while on INBreast the comparison is for benign/malignant classification. 2) For multi-vew based mass detection, there are more relevant papers with good results on DDSM, which are omitted by this paper. 3) There is no RA (gaze guided) result on DDSM. 4) The gaze data is collected from only one radiologist, which is not representative in general. 5) There is a lack of analysis to show that the gaze data is necessary or supurior to build attention across multiple views, in comparison to normal labels.
Please rate the clarity and organization of this paper

Poor
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

It seems that the related data (gaze data) will not be released, which makes it difficult to reproduce the result.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

1) I would suggest collecting gaze data on DDSM and performing a similar evaluation as done on INBreast, to make the experiment setup consistent. 2) Gaze data from more radiologists can improve the data reliability. 3) It is unclear from the paper, if gaze data is better than the normal pixel/bounding-box labels as supervision or just to save the effort to label the mammogram data?
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

3
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

1) The experiment setup is not consistent, see above comments. 2) The gaze data is from a single subject, and will not be shared, therefore hard to reproduce the result.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #4

Please describe the contribution of the paper

A method to integrate radiologist attention, measured by gaze patterns, into the training process of a dual view network for joint CC-MLO analysis. An attention consistency module aligns the radiologist and network attention and thus provides positional supervision. The combines two previously investigated topic (integration of gaze information and integration of multiple views) into a single architecture. At inference time, gaze registration is not needed.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Training deep neural networks without requiring pixel level annotation is key to apply deep learning at scale in the medical domain. Training from case level labels requires larger datasets and providing information about lesion location is typically required for optimal performance. The idea to use radiologists’ gaze to guide the training process, while not new, is underexplored in the literature. While certainly not as easy to deploy as manual annotation, has potential. For instance, it can be used to guide training even in normal cases, for which pixel level annotations are difficult to define and collect (although, to be fair, this aspect is not directly investigated in the paper)
- The paper is clear and well written with experiments on two datasets and includes a suitable number of comparators.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Contribution is incremental with respect to previous literature. A number of works have already dealt with dual-view mammography analysis and, more recently, also joint four-view analysis was considered. Thus, in the proposed architecture asymmetries are not considered. It is also not clear from the paper whether integrating radiologists’ attention in dual-stream architectures is more difficult and/or more beneficial than doing so in single-stream architecture (of course, dual-stream architectures perform better on mammography, but the main contribution is integrating these two aspects)
- Radiologist attention was collected only on the smaller INBreast dataset. Experiments on the CBIS-DDSM only focus on bilateral integration, which is not the main contribution.
- The description of how eye gaze was recorded is minimal and insufficient for reproducibility.
- The experiments do not report standard deviation, e.g., by repeating each experiments to evaluate variability. Given that InBreast is a small dataset, there is a chance that differences do not reach statistical significance.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Experiments should be fairly reproducible. The weakest point regarding reproducibility is collection of gaze information, since the description of the reading protocol/acquisition set up is rather succinct.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
My major comment is related to how eye gaze information was recorded and how it affects the training.
1. From the point of view of data collection, it is not clear how the reading protocol was set, whether radiologists were allowed to magnify or re-arrange views, and how the illumination conditions affect gaze recording (reading rooms should be rather dark to maximize contrast).
2. From the point of view of the impact of radiologist attention on the training process, Radiologist attention was collected only on the smaller INBreast dataset. INBreast is not only small, but also contains mostly positive exams, whereas a mammography dataset would contain a vast majority of negative exams. while this is reasonable per se, the experiments on CBIS-DDSM do not appear to include gaze data, but only dual view fusion – which we already know from the literature to be beneficial. It would have been more interesting to investigate a setting in which gaze information is available for a subset of the dataset, rather than two separate datasets. Since we already know from previous literature (see refs [15, 23]) that gaze can be used to guide the training process, it would be interesting to move towards investigating more complex questions, e.g., the relationship between the number of annotated exams and the overall size of the training set, how to optimize the reading protocol, comparison with other forms of supervision, impact on different architectures, etc.
3. Since INBreast is annotated, it would be interesting to compare eye gaze against standard pixel-level supervision as a further baseline.
Minor comments:
- Typo at page 2 (Radiologists need to magnify mammograms to differentiate…)
- Unclear sentence at page 4 (a pyramid loss constraint requires consistency… )
- At page 4, the sentence “the network focuses on lesions where the radiologist spent most time” is unclear, the network and the radiologist may focus on areas that are not lesions
- Fig.2 could be extended to include at least one negative case
- MammoNet has already been used in multiple papers. A different acronym may better differentiate
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The topic is interesting, and the methodology is sound. However, the contribution is rather incremental, and the experiments could/should have focused more, in my opinion, on the gaze integration rather than dual-view integration to increase novelty and impact of the work.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

5
[Post rebuttal] Please justify your decision
I appreciate that the authors have added further experiments and provided clarifications on many issues. However, some of the weaknesses persist after rebuttal, in particular:
- the setup is not fully described in the appendix, not in terms of hardware, but with respect to the hanging protocol used by the reader, use of magnification, prior images, etc. which would be present in a realistic reading scenario. I appreciate that space is tight, and perhaps the authors intend to disclose this information in the github repository.
- the experiments on DDSM “only” refer to multi-view integration, which is important but not novel. I understand that collecting gaze data would be expensive, and not feasible within the timeframe of rebuttals. Nonetheless, the inclusion in the paper is not consistent with the title and may be misleading. Even providing strong supervision on a small subset could improve overall performance (see for instance Did You Get What You Paid For? Rethinking Annotation Cost of Deep Learning Based Computer Aided Detection in Chest Radiographs https://link.springer.com/chapter/10.1007/978-3-031-16437-8_25) .

Review #5

Please describe the contribution of the paper

The paper describes a method of mammography classification using the radiologist gaze data for training. Furthermore, the method employs two acquisitions of a breast (namely craniocaudal and mediolateral oblique) to generate a prediction. Public datasets (CBIS-DDSM and INBreast) are used for evaluation improving reproducibility. The presented results evaluate each of the components of the proposed method. Performances for a few of the state-of-the-art methods are presented for comparison.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Using the gaze data as regularization for training is quite appealing and might be clinically meaningful. That is, such an approach has the potential to improve explainability. This is particularly relevant to mammography where radiologists look for abnormalities in several views. The presented ablation allows for comprehension of the contribution of each of the proposed parts of the algorithm. Overall the paper is clear and the method is well presented.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The experiments might need further clarification to allow better positioning within the state-of-the-art. That is, some of the existing methods (Shen et al. 2019, or those from Stadnick et al. doi: 10.48550/arxiv.2108.04800) might need to be mentioned to better understand the performances. The data splits need clarification, as they are different from Shen et al. 2019, hence prevent from straightforward comparison to others.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors provide some amount of information about training setup. Code is provided allowing for better comprehension of the experiments. However, image preprocessing details appear to be absent which is quite penalizing for an algorithm treating mammography images.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

I would like to thank the authors for their work. I have a few suggestions that I hope the authors will find helpful. - In the BFL paragraph (page 5), the authors state that the dissimilarity between different patients enhances the robustness. However, there are likely to be some similarities in pathologies. Could the authors comment? - In 3.1., the authors do not state what split was used for CBIS-DDSM. Could the author mention it, even if the official split was used? - In 3.1, the authors state that INBreast dataset does not have image-wise labels. However, the official INBreast dataset has both density and malignancy image-wise labels. Could the authors comment? What labels were researched? - In 3.1., the authors state using 8:2 split of INBRreast, which is for example different from the split used by Shen et al. 2019, which makes the comparison less simple. Could the authors comment? - The authors describe the scenario of the eye movement data collection in the supplementary material. It appears as 4 images are displayed on the screen, which may alter the movement compared to a two-image display (in particular in mammography, where the asymmetries are researched). Could the authors comment? - In Table 1, the authors introduced CC and MLO view results. Could the authors comment on how these results were obtained? - In Fig 2. the authors show an illustration of abnormalities in mammograms. It would be helpful to locate the abnormalities in the original images. - In 3.3. the authors say “Eye movement data may not be available when uploading raw data to a cloud model at a township hospital”. Could the authors clarify the meaning of the cloud model and why it is relevant? - There are methods claiming breast-wise performances (e.g., doi: 10.48550/arxiv.2108.04800). Could the authors discuss them to better position the proposed work?
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper is likely to be of interest, particularly with regard to the performed eye movement acquisition protocol for public datasets. However, there are some observations yet to be addressed to allow for better understanding of the contribution.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

6
[Post rebuttal] Please justify your decision

The authors feedback is relatively well organized and the majority of questions are addressed. Some additional numerical results are given, which is helpful.

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

Strengths: some technical novelty (the pyramid loss based attention consistency module); using radiologists’ gaze is known capable of facilitating training but this is underexplored; this method may improve explainability; meaningful ablation studies;

Weaknesses: confusion/inconsistency in terms of detection or classification tasks; missing results from quite a number of relate work in the literature for comparison; a lack of analysis to show that the gaze data is necessary or superior to build attention across multiple views in comparison to normal labels; reproducibility of the work is low due to a few critical reasons as pointed by reviewers; experiment results are limited as gaze data are only used in one dataset and gaze data were collected only from a single radiologist (potential user variations and bias); overall contribution is incremental.

Points to address in rebuttal: Please respond to critical weaknesses from all reviewers and make sure below questions are included: clarification on consistency of experimental setups; reliability/reproducibility/inter-&intra-reader variations of gaze data collection; comparison with the use of normal labels in terms of benefits in accuracy (not labor).

Author Feedback

We appreciate recognition of our strengths, i.e., novelty of utilizing gaze, and information fusion module for classification. In the following, we address common comments: -Q1. We conducted more comparative experiments: Dataset｜ AUC ｜ACC Chen et al. (doi: 10.3390/diagnostics12071549): DDSM ｜ 81.8±3.9 ｜76.3±2.8 INBreast｜85.7±2.3｜82.2±4.2 Quintana et al. (doi: 10.3390/bioengineering10050534): DDSM ｜ 80.9±0.5 ｜73.0±1.9 INBreast｜86.4±1.7｜80.3±3.4 Ours: DDSM ｜ 82.1±6.3 ｜86.4±4.8 INBreast｜88.9±3.7｜84.9±1.8

-Q2. Normal label comparison: We conduct experiments to show gaze can achieve on-par performance comparing to more costly bounding boxes labels. Anno｜ AUC ｜ACC Gaze｜88.9±3.7｜84.9±1.8 Bbox｜85.9±2.8｜87.4±2.6

-Q3. Collection protocol and reproducibility: Due to space limit, we exhibit the entire-data collecting procedure and hardware setup on GitHub. The collection scenario and gaze preprocessing are shown in supplementary materials. We provide code and weights on GitHub to exhibit the performance with and without gaze supervision. Due to approval constraints, INbreast eye movement data will be released in late 2023 in conjunction with eye movement data being collected from other datasets.

-Q4. Reliability/inter-reader variance: Lou et al. (doi: 10.1109/ICIP42928.2021.9506017.) indicates that experienced radiologists often exhibit similar eye movement patterns when interpreting mammograms. This pattern of similar eye movements is not exclusive to mammograms, but extends to other medical image modalities, including chest x-rays and CT images (doi: 10.1148/radiol.2422051997 and doi: 10.1148/rg.331125023). Consequently, collecting the gaze pattern of a single senior radiologist with 11 years of experience provides a time-efficient method that can still yields representative results.

-Q5. No DDSM gaze results: Collecting DDSM gaze needs roughly 350 hours (including image reading and rest time to simulate clinical scenario). The efficiency of the view fusion module can still be tested on DDSM.

-Q6. Gaze data necessity: Time efficiency and non-intrusiveness. 1) Eye tracking serves as a quicker form of data collection compared to other methods such as drawing bounding boxes/circles with a mouse, crafting masks, or writing textual reports. 2) An eye-tracker collects data without causing disruption to the radiologist’s workflow. This is possible by simply attaching it to the computer screen, compared to other methods that require additional operation.

In the following, we clarify some confusions in writing: -Q1. Detection or classification (R3, Meta): Misstatement. We unify them to mass classification.

-Q2. Multi-stream and single-stream attention (R4, R5): Our single-view baseline is similar to a branch of the multi-view baseline, both consisting of a series of ResNet-18 blocks. And the difficulty of incorporating gaze supervision is equivalent. Multi-view architectures offer the benefit of integrating information from multi-view.

-Q3. Gaze of healthy image (R4): For images without lesions, radiologists focus on the most informative areas (e.g., benign lesion). We will exhibit a negative case.

-Q4. Lesion similarities and robustness (R5): Supervised contrastive framework can easily extract similarities from representation (doi: 10.48550/arXiv.2006.07733). So we prioritizes differences between cases to enhance robustness.

-Q5. Dataset split and INBreast label (R5): We split DDSM using official method. We split INBreast following Lopez et al. (doi: 10.48550/arXiv.2204.05798) to match our task. INbreast does not provide pathological confirmation of malignancy but BI-RADS labels. So we obtain binary labels following Shen et al. (doi: 10.1038/s41598-019-48995-4).

-Q6. Cloud model relevance without gaze (R5): Jiang et al. requires gaze input during both the training and inference stages, which limits its practical use in hospitals without eye-trackers. In contrast, our method does not rely on gaze input during inference stage.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors provided major new experiment results in rebuttal which is not allowed, and it has affected reviewers’ assessment. With rebuttal two reviewers still see considerable weakness of the work including low confidence to the setup for collecting gaze data. Using gaze to improve training does not seem to be clinically effective/robust given that there are too many variations and factors to consider in obtaining gaze data and those were not sufficiently investigated in the paper. The work is not mature to be accepted.

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

There are concerns with the writing and reproducibility of the paper. However, the topic looks interesting and two reviewers support this paper. It would be great to clarify the experimental settings and improve the reproducibility by publishing the gaze dataset. I would suggest ‘Accept’.

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

Despite some concerns still not being sufficiently addressed by the authors for some of the reviewers, I’m swayed by the idea of using radiologist attention as measured by gaze patterns for training. I think it would be an interesting topic at MICCAI and believe the authors could address the remaining issues.

back to top

Mammo-Net: Integrating Gaze Supervision and Interactive Information in Multi-view Mammogram Classification