Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Hyeonsoo Lee, Junha Kim, Eunkyung Park, Minjeong Kim, Taesoo Kim, Thijs Kooi

Abstract

Recently, deep learning models have shown the potential to predict breast cancer risk and enable targeted screening strategies, but current models do not consider the change in the breast over time. In this paper, we present a new method, PRIME+, for breast cancer risk prediction that leverages prior mammograms using a transformer decoder, outperforming a state-of-the-art risk prediction method that only uses mammograms from a single time point. We validate our approach on a dataset with 16,113 exams and further demonstrate that it effectively captures patterns of changes from prior mammograms, such as changes in breast density, resulting in improved short-term and long-term breast cancer risk prediction. Experimental results show that our model achieves a statistically significant improvement in performance over the state-of-the-art based model, with a C-index increase from 0.68 to 0.73 (p < 0.05) on held-out test sets.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43904-9_38

SharedIt: https://rdcu.be/dnwHh

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #3

  • Please describe the contribution of the paper

    In this paper authors improve breast cancer risk models by considering the addition of an old mammogram into the system. The current and prior images are passed through a common ResNet to extract features which are subsequently fused into a transformer-encoder net. The cumulative hazard function is obtained concatenating the features obtained before the transformer-encoder and the obtained after it.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Cancer risk estimation is a critical step in mammographic analysis. The addition of previous information in these algorithms should provide novel insights and clinical benefits.

    The presented architecture, evaluated in a large clinical scenario, shows that the objective is achieved. Authors succesfully compare their approach with a previous baseline that do not used previous mammogram and with another approach using current and previous mammograms but without using the transformer step.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I would say that the main weakness of the paper is that authors do not incorporate clinical information into the algorithm. I understand that this is probably out of the scope of the paper, however, this information is critical into any risk assessment work.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper is not easy to reproduce due to the inherent number of parameters of deep learning approaches. Authors are consistent in their answers.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    I have few comments to add to the paper. Probably, how can clinical information be added to the system? Also, is there any way to stratify the approach into populotion variability (i.e. asiatic, european, black women?).

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I think the paper is interesting. Experiments are done with a large number of population. The main limitation is probably the reproducibility.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The contribution of this paper is as follows:

    1. To propose a neural network with CNN and transformer decoder to assess breast cancer risks using prior and current mammograms.
    2. Well-organized experiments using relatively large data sets and subgroup analysis to evaluate the effectiveness of using prior mammogram.
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Main strengths of this paper:

    1. This paper presents a neural network with CNN encoders and transformer decoder to fuse the features of prior and current mammograms. The proposed method is trained using base and time-dependent hazard functions considering maximum observation periods. This approach is proper to incorporating the prior mammograms in different time points during training.

    2. Subgroups analysis with respect to density is suitable for showing the effectiveness of using prior mammogram in risk prediction.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    No comparison with the state-of-the-art methods.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Low reproducibility

    • Source code or data set are not publicly available
    • Not enough explanation on the proposed network architecture
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    This paper is well organized, and the proposed method showed promising results in the risk prediction using mammograms.

    However, the reproducibility of the paper is limited. It would be better, if more explanation on the proposed method, especially architecture details.

    Comparison with related work is required.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. Novel method
    2. Well-organized experiements
  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #1

  • Please describe the contribution of the paper

    The main contribution of this paper is to propose a new breast cancer risk prediction method PRIME+, which uses previous mammograms to predict breast cancer risk. PRIME+ performed better than current state-of-the-art methods for risk prediction using mammograms at a single time point. The authors validated the method on a dataset of 16,113 exams and showed that it effectively captured patterns of change in previous mammograms, improving short- and long-term breast cancer risk prediction. This work is expected to contribute to the early detection of breast cancer and improve the efficiency of targeted screening strategies.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The use of previous mammograms for risk prediction is a novel approach. PRIME+ uses a transformer decoder to capture patterns in breast tissue over time, an innovative approach that could improve the accuracy of breast cancer risk prediction.

    2. The method is extensively validated on a dataset of 16,113 checks and demonstrates its effectiveness. The authors also compared PRIME+ with the current state-of-the-art risk prediction methods, and the results show that PRIME+ performs better.

    3. This method is expected to contribute to the early detection of breast cancer and improve the efficiency of targeted screening strategies. This is of great significance for improving the survival rate and quality of life of breast cancer patients.

    4. The method proposed in the paper is clinically feasible. The method can be easily integrated with existing medical devices and processes to provide doctors with more accurate and timely diagnoses.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. This method only considers mammograms and does not consider other factors that may affect breast cancer risk, such as family history, lifestyle, etc. Therefore, this method may have certain limitations.

    2. The paper did not provide enough experimental details and analysis of the results. For example, the extent to which PRIME+ improves performance over state-of-the-art methods is not detailed, and no statistical significance test is performed.

    3. This method requires a large amount of data to train the model, and requires professional doctors to label the data. This may increase costs and time costs, and may limit the application of this method in certain regions or medical institutions.

    4. The method still needs to further verify its effectiveness and generalization ability on different populations and different data sets. Currently, the authors have only validated the method on one specific dataset.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    According to the reproducibility checklist filled out by the authors, they provided enough information to enable others to reproduce their experimental results.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. Provide more details on the dataset used: While you have mentioned that you used a dataset of 16,113 exams, it would be helpful if you could provide more details on the dataset such as the demographics of the patients, how the data was collected, and any preprocessing steps that were taken.

    2. Provide more analysis of experimental results: While you have compared PRIME+ with state-of-the-art methods and shown that it outperforms them, it would be helpful if you could provide more detailed analysis of your experimental results. For example, you could analyze how PRIME+ performs on different subgroups of patients based on age or family history.

    3. Discuss limitations and future work: While you have briefly mentioned some limitations of your approach in the conclusion section, it would be helpful if you could discuss them in more detail. Additionally, it would be useful if you could discuss future work that could build upon your approach.

    4. Improve reproducibility: While you have provided code and data to support reproducibility, there are still some areas where reproducibility can be improved. For example, it would be helpful if you could provide a detailed description of how to run your code and reproduce your results.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Originality and significance of the research question

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #4

  • Please describe the contribution of the paper

    This work proposed a deep learning based method for time-dependent breast cancer risk prediction incorporating current and prior mammograms, using a transformer type architecture. The proposed method was tested on an in-house large mammography dataset. Preliminary results show that the proposed method seems to offer better performance than one existing state-of-the-art approach.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • This paper is fairly well written and easy to follow
    • Breast cancer risk prediction using prior images is a clinically relevant topic
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The analysis of existing methods could be more thorough. Also, the authors only compared the proposed approach to 1 existing approach.
    • Description of the architecture and analysis method could be much improved
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The experiments were conducted on a private dataset. The codes do not seem to be publicly available. It seems that the results in this work could be difficult to reproduce.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • Page 2, second paragraph, typo: parechnymal, should be parenchymal. In general, there are a couple typos and missing words: a careful proof-reading is necessary.
    • Section introduction: the authors only list existing deep learning based risk prediction models without further descriptions. I believe a more detailed analysis of existing methods is necessary. Readers need to see how the current work differs from existing approaches.
    • Section 2.3 Incorporating Prior Mammograms: could the authors elaborate on how the time-dependent prediction during inference was produced? What is the input to get prediction to get 2/3/4-year cancer probability?
    • Section 2.3 Incorporating Prior Mammograms: please consider specifying details on hyperparameters of the transformer decoder (number of layers etc.).
    • Section 3.3 Implementation Details: resizing images to 960 x 640 could have negative impact on visibility of certain findings in mammograms, such as microcalcifications. Did the authors experiment with different input sizes, or was this more a GPU memory consideration? Could the authors comment on this?
    • Section Results: could the authors provide more details on how the statistical tests were conducted to compute the p-values? The error bars seem to be very large compared to difference in average values.
    • Why is only this particular existing method [32] chosen as the baseline for all comparisons? Could the authors comment on this?
    • Section conclusion: please consider adding directions for future research.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The research presented in this work seems to be interesting and addresses a relevant clinical challenge. How ever, the present could be further improved, by adding more comparisons with existing methods, being more clear on description of the proposed method, and bring more justifications to the results and findings.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Strengths: target a clinically related questions; a large dataset with subgroup analysis for experiments; use of a Transformer decoder to capture breast tissue change over time.

    Weaknesses: authors claim “the first breast cancer risk prediction model which effectively leverages the information from both prior and current mammograms” but this is not true (see S. Dadsetan, et al., Deep learning of longitudinal mammogram examinations for breast cancer risk prediction. Pattern Recognition 132, 2002, https://arxiv.org/abs/2304.00257, etc.). The authors must properly cite related work and assess their contribution. The model is constrained as it does not consider other non-imaging-based clinical risk factors; experiments are limited as it is not clear to what extent the proposed method outperforms SOTA methods - comparison with related work is superficial and it needs more comparisons with other existing work, including current clinical risk models and missed baseline methods that do not use prior images in the literature; statistical significance results not fully provided in Table 1 (only provided for the simple baseline method); lack important details about the dataset; low reproducibility - consider making at least source code and models publicly available.




Author Feedback

We appreciate the constructive feedback on our paper from the reviewers. We are glad that the reviewers found our approach for leveraging prior mammograms for breast cancer risk prediction novel (R1, R2), our experimental results on a large-scale dataset thorough (R2, R3), and the clinical impact of our work significant (R2, R3, R4). We provide our responses to the reviewers’ comments below.

  1. Proper Citation of Related Work (M1): We acknowledge the concern about the citation of related work and appreciate the addition. We will update our literature review to ensure that we correctly cite and discuss all relevant work for the camera-ready version.

  2. Use of Additional Risk Factors (R1, R3, M1): We acknowledge that other risk factors have been widely used in previous risk assessment methods and provide important baselines. As Yala et al. demonstrated in recent work [31, 32], the risk assessment performance using only image-based deep features surpasses that of risk models built using other risk factors. We limit the scope of this paper to investigate the value of leveraging prior mammograms to further improve risk assessment performance to improve clarity. We agree with the reviewers that using additional risk factors together with deep visual features is interesting and reserve it for future research. Space permitting, we will include this note in the discussion section.

  3. Lack of Comparison with State-of-the-Art Methods (R2, R4, M1): We provide a comparative analysis to a state-of-the-art model [31, 32] that requires only images. We note that this is a strong baseline validated in a multi-institute retrospective study and requires no additional risk factors for predicting breast cancer risk.

  4. Detailed Experimental Setup, Image Size, Network Architecture and Reproducibility (R1, R2, R4, M1): Regarding the impact of image size on risk prediction performance, we did not observe a notable impact on risk prediction performance. This is consistent with the findings of [19] which highlighted that risk assessment is more focused on the whole breast rather than specific regions. We will release code required to implement our approach which will detail necessary experimental setup and configurations to improve reproducibility.

  5. Dataset Details (R1, R3): We will not be able to publicly release the data used for the study due to the data usage contract and privacy constraints.

  6. Data and Time Costs (R1): We agree that model size and the large data requirements may become a barrier in practical deployment of the model in real-world usage. However, our PRIME+ methodology, while embodying modest enhancements to the current state-of-the-art techniques, does not entail a substantial augmentation in model runtime or labeling requisites. Specifically, it introduces a marginal 5% rise in runtime alongside a 7% increase in the total number of parameters. We can make this clearer in the text.

  7. Statistical Test Details (R4, M1): We compute the p-values using the DeLong test [5] and CompareC [16] methods. We attribute the large error bars to a relatively limited number of data points for certain metric computation configurations. As a further extension of this research, we are actively working towards increasing the data number of points used in the study.



back to top