Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Maya Gilad, Moti Freiman

Abstract

Early prediction of pathological complete response (pCR) following neoadjuvant chemotherapy (NAC) for breast cancer plays a critical role in surgical planning and optimizing treatment strategies. Recently, machine and deep-learning based methods were suggested for early pCR prediction from multi-parametric MRI (mp-MRI) data includ- ing dynamic contrast-enhanced MRI and diffusion-weighted MRI (DWI) with moderate success. We introduce PD-DWI, a physiologically decom- posed DWI machine-learning model to predict pCR from DWI and clin- ical data. Our model first decomposes the raw DWI data into the var- ious physiological cues that are influencing the DWI signal and then uses the decomposed data, in addition to clinical variables, as the in- put features of a radiomics-based XGBoost model. We demonstrated the added-value of our PD-DWI model over conventional machine-learning approaches for pCR prediction from mp-MRI data using the publicly available Breast Multi-parametric MRI for prediction of NAC Response (BMMR2) challenge. Our model substantially improves the area under the curve (AUC), compared to the current best result on the leaderboard (0.8849 vs. 0.8397) for the challenge test set. PD-DWI has the potential to improve prediction of pCR following NAC for breast cancer, reduce overall mp-MRI acquisition times and eliminate the need for contrast- agent injection.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16437-8_4

SharedIt: https://rdcu.be/cVRsQ

Link to the code repository

https://github.com/TechnionComputationalMRILab/PD-DWI

Link to the dataset(s)

https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=50135447


Reviews

Review #1

  • Please describe the contribution of the paper

    This study proposes a method for prediction of pathological complete response (pCR) to neoadjuvant chemotherapy for breast cancer. The method exploits diffusion weighted MRI data, taking into account both pseudo-diffusion and pure diffusion, using an approximation of the bi-exponentional IVIM model fitting. Machine learning is performed using a radiomics approach, using XGBoost classifier to combine all features (including some clinical features). On the BMMR2 challenge dataset, the method shows improved performance compared to other DWI and DCE MRI based methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) Very well written manuscript. 2) The clinical relevance is well motivated (avoiding the need for contrast-enhanced MRI). 3) The methodological choices are well motivated. 4) The method is validated on a public challenge dataset, and shows promising results.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) From a technical point of view, the novelty is a bit incremental. The technique for extracting approximate pseudo-diffusion fraction and pseudo-diffusion maps is based on ref [11]. The radiomics method using XGBoost is fairly standard. The novelty is in the combination of techniques and the application to this particular task. 2) From Table 2, it seems there is a substantial improvement compared to state of the art methods. From Table 1, we see that the largest part of that improvement is already achieved by the ADC_0-800-only method, i.e., the one based on a simple ADC estimate. The advanced PD-DWI approach only adds another percent. This makes me wonder what was the key innovation in the overall framework that explains the improvement compared to state-of-the-art. Is it the use of XGBoost? The manuscript would have been more insightful if this question was answered. 3) Confidence intervals and/or results of statistical testing on the AUC measures are missing, so it is not clear whether the reported improvements are statistically significant.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The dataset (part of a challenge) and experiments are very clearly described. Code will be made publicly available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • Sec 3: the the -> to the
    • scale_pos_weight -> please explain to which method/component this hyperparameter belongs, and what it does.
    • Addressing weaknesses 2 and 3 would make the paper stronger.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I enjoyed reading this paper, it is a solid investigation, but the novelty and impact might be too limited.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    All in all this is a solid piece of work, and in the rebuttal the authors provided useful additional explanations on methodology, model evaluation, and driving factors.



Review #2

  • Please describe the contribution of the paper

    The study introduces a physiologically decomposed diffusion-weighted MRI (PD-DWI) machine learning model to predict the pathological complete response (pCR) from DWI and clinical data. The proposed model improved the performance of predicting pCR when applied to a public breast data challenge (BMMR2).

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Used the BMMR2 challenge dataset for training and testing for the proposed machine learning model.
    2. Breast DWI data were analyzed by the bi-exponential signal decay model, generating D (pure diffusion coefficient), D* (pseudo-diffusion coefficient), and F (pseudo-diffusion fraction). Both ADC and F were used to extract 3D radiomics features for model prediction.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. It was not clear how clinical features were combined and used. AUC with and without including clinical information will be required.
    2. The feature selection process needs to be clarified with some justification (the number of features, selection criteria, and consistency and stability).
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    No issues.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    It is interesting to see the model was only based on breast DWI data, but it would be useful to investigate a way to combine different imaging sequences, particularly DCE-MRI. DWI sometimes suffers from insufficient image quality, and a combination of DWI and DCE will either improve the performance or the reproducibility. Different feature selection approaches with and without clinical information need to be further investigated to improve the generalizability of the model.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed model showed improved prediction of response to neoadjuvant chemotherapy in breast cancer using publically available BMMR2 breast data challenge.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    This work provided a PD-DWI method for the breast cancer pCR prediction, which could decompose DWI data into an ADC 0-100 map and an F map. And the new maps-based radiomics model could get the optimal performance that goes beyond the top performance in the BMMR2 challenge.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strengths of this work: (1) One novel medical imaging method, called Physiologically-Decomposed Diffusion-Weighted MRI (PD-DWI), demonstrated a substantial improvement in pCR prediction, without the need for lengthy DWI acquisition times, Gadolinium-based contrast agent injections, and DCE-MRI imaging. I think it’s interesting, and it is a good study of medical physics. But the work only demonstrated that PD-DWI could improve the machine learning model performance, not further finding new imaging patterns or biomarkers, that were the most important for clinical application. (2) Some interesting finds. The work found the relation between DWI signal attenuation decay and pCR prediction and accounted for the different physiological cues associated with pCR as reflected by the DWI signal rather than using aggregated information by means of the ADC map. (3) The top prediction performance. The model got an AUC of 0.8849, which overperformed the top-performance in the challenge (AUC = 0.8397).

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main weaknesses: (1) The overall innovation may not be enough. The main innovation was in DWI processing, and the parts of feature engineering and modeling were normal. And the DWI signal processing methods were not original innovative, the formulations were others in medical physics [1-2]. (2) The model architecture was simple (Fig.3). (3) The work found some interesting findings and demonstrated that PD-DWI could improve the machine learning model performance. But the work was not further finding new imaging patterns or biomarkers using the ADC 0-100 and F maps, which was the most important for clinical application. (4) In Table 1, all the ADC-based machine learning models overperformed the BMMR2 challenge Top-3 performances. I think the reliability of the results may be questioned by readers.

    [1] HST.583 Functional Magnetic Resonance Imaging: Data Acquisition and Analysis. https://dspace.mit.edu/bitstream/handle/1721.1/51692/HST-583Fall-2006/NR/rdonlyres/Health-Sciences-and-Technology/HST-583Fall-2006/Assignments/ps3.pdf. [2] Le Bihan D, Breton E, Lallemand D, Aubin ML, Vignaud J, Laval-Jeantet M. Separation of diffusion and perfusion in intravoxel incoherent motion MR imaging. Radiology. 1988 Aug;168(2):497-505. doi: 10.1148/radiology.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The data was from BMMR2 challenge (https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=89096426), and the authors would provide the code and trained models upon acceptance. I think the reproducibility of the paper was good.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    (1) May consider the other feature selection and machine learning modeling methods, not only ANOVA and XGBoost. (2) The work selected 100 features with highest ANOVA F-values for the modeling. I don’t think it’s an appropriate decision, usually a feature need at least ten observer samples. (3) Whether most samples all meet the rule of Fig.1? (4) Lack of radiomics feature repeatability analysis in the new maps. (5) Please add the discussions why the new maps-based radiomics models all got the optimal performance.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    At a time when deep learning is rampant in Medical Image Analysis, we need exploration and application in traditional signal processing and medical physics algorithms. The work was not innovative enough, but overall it was a good study.

  • Number of papers in your stack

    6

  • What is the ranking of this paper in your review stack?

    5

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    I have a further in-depth understanding of this study through the authors’ rebuttal, which is really a good MedIA work. This research work is closer to the essence of medical imaging and medical physics. Our field may need more of this work, rather than just exploring some of the so-called advanced algorithms. So my final decision is to accept. Thanks for the work and the invitation to review.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Key strength is detailed explanation and development of a novel Physiologically-Decomposed Diffusion-Weighted Model, that has been systematically evaluated for pCR prediction in breast cancers on a public challenge dataset. Relevance and motivation are well explained. Weakness include incremental novelty of physiological decomposition (which has been previously published), likely overfitting of the data based on number of features being used, minimal statistical analysis between approaches.

    Key points to address in rebuttal:

    • Actual number of features used for model training, if 100 features model has been massively overfit
    • Statistical analysis of performance of different strategies in both training and validation
    • Discussion of what may be driving factor for improved performance of PD-DWI model.
    • How were clinical features used/integrated
  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6




Author Feedback

We thank the reviewers and meta-reviewers for their insightful comments.

  1. Novelty 1) DWI physiological decomposition from a limited number of b-value images. Prior studies use the IVIM analysis which requires a large number of b-value images. Hence, our approach will enable a reduction in overall DWI acquisition time, 2) A novel methodology to integrate clinical and imaging features in machine-learning models for prediction of clinical outcomes (see below), 3) A response to NAC prediction model that outperform state of the art results on a publicly available dataset that does not require DCE-MRI scans, and; 4) A first of its kind, to the best of our knowledge, longitudinal evaluation of response to NAC prediction models.
  2. Architecture 2.A. Model overfitting We circumvented model overfitting despite the large number of input features by hyper-tuning 3 parameters (section 2.4). By assigning a value larger than 1 to the min-child-weight parameter we increase the minimum weight of samples required for creating a new node in the model. Thus, we encourage a more robust tree that is less subject to overfitting. The max-depth parameter essentially limits the number of features that are used in the model by defining the maximum number of nodes allowed from the boosted tree’s root to the farthest leaf. The combination of these parameters assures that the model is based on a low number of features. The subsample parameter adds another layer of mitigation against model overfit by randomly selecting a new subset of training samples to use in training. Furthermore, the objective results of our model on the blinded test set demonstrate that the model is robust and does not overfit the training set. 2.B. Label imbalance We addressed class imbalance (70% non-pCR patients and 30% pCR) by adjusting the scale-pos-positive parameter appropriately. Hence, we encouraged the model to correct errors on pCR samples. 2.C. Clinical features integration
    Four clinical features were processed and used in our model (section 2.3): 1) We modeled the 4-level hormonal receptor status into 2 binary features, representing single hormonal receptor status, rather than into 4 binary features representing each configuration. This representation allows the model to leverage differences in radiomic features for each tumor type, 2) The 3-level tumor grade represents both the severity of the disease and, indirectly, the spread of the tumor. We modeled the relation between the different severity levels through a scalar feature. Thus, allowing XGBoost to split the tree by disease severity, 3) We modeled race and lesion type with one-hot-encoding. All transformed clinical features were included in the feature selection process alongside the radiomic features.
  3. Model evaluation As the challenge provided only a single AUC measurement per model, we were not able to evaluate the statistical significance of our approach at submission. We conducted an ablation study to assess the specific role of each of our contributions through the challenge platform (table 1, fig. 4). Finally, we evaluated our model performance longitudinally (fig. 4). The test set was released on May 2. Similarly to the training set, the test set was imbalanced. PD-DWI achieved best Cohen’s Kappa score (0.62) vs. ADC_0-800 (0.48). PD-DWI AUC was statistically significant compared to the F-only model (Sensitivity test, 0.95, p<0.05) but not compared to other models.
  4. Driving factor for improved performance The architecture of the other challenge models have not been published. We assume the driving factors of our model’s performance are: 1) Robust calculation of the ADC map using a robust least-squares approach for all b-values images. It is possible that extracting features from the provided ADC map had a negative impact on other models, 2) The physiological decomposition enables our model to better characterize response to NAC, 3) Training data set pruning.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Authors have reasonably addressed the key points raised on the original review. They do need to include better statistical analysis on the final submission (especially given the massive number of features used - even despite the internal optimization of XGBoost, there is a high chance of overfitting). The work is otherwise acceptable.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors’ rebuttal is very detailed and has answered the key concerns related to work novelty, data imbalance issue and model overfitting, clinical features integration strategy. The work is interesting and attractive to the MICCA readership.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors sufficiently addressed the key concerns raised by the reviewers. After the rebuttal, the reviewers unanimously agreed to accepting the paper. I concur with this decision.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    8



back to top