Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Mohammad Mohaiminul Islam, Bogdan Badic, Thomas Aparicio, David Tougeron, Jean-Pierre Tasu, Dimitris Visvikis, Pierre-Henri Conze

Abstract

Evaluating treatment response is essential in patients who develop colorectal liver metastases to decide the necessity for second-line treatment or the admissibility for surgery. Currently, RECIST1.1 is the most widely used criteria in this context. However, it involves time-consuming, precise manual delineation and size measurement of main liver metastases from Computed Tomography (CT) images. Moreover, an early prediction of the treatment response given a specific chemotherapy regimen and the initial CT scan would be of tremendous use to clinicians. To overcome these challenges, this paper proposes a deep learning-based treatment response assessment pipeline and its extension for prediction purposes. Based on a newly designed 3D Siamese classification network, our method assigns a response group to patients given CT scans from two consecutive follow-ups during the treatment period. Further, we extended the network to predict the treatment response given only the image acquired at first time point. The pipelines are trained on the PRODIGE20 dataset collected from a phase-II multi-center clinical trial in colorectal cancer with liver metastases and exploit an in-house dataset to integrate metastases delineations derived from a U-Net inspired network as additional information. Our approach achieves overall accuracies of 94.94% and 86.86% for treatment response assessment and early prediction respectively, suggesting that both treatment response assessment and prediction issues can be effectively solved with deep learning.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16437-8_46

SharedIt: https://rdcu.be/cVRux

Link to the code repository

N/A

Link to the dataset(s)

https://clinicaltrials.gov/ct2/show/NCT01900717


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes a longitudinal approach to treatment response assessment via siamese networks as well as treatment response prediction from baseline image.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. treatment response prediction can help in patient stratification for a given treatment.
    2. Ablation study to establish contribution of each component
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Since the dataset is relatively small … 245 scans … crossvalidation or montecarlo sampling would have been better to show robustness of the approach. It can currently be biased to the split.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Reproducible

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. Further clarifying the difference between the Siamese network approach of Jin et. al. and the proposed work would be beneficial.
    2. Including a sentence or reference on the validity of the selected augmentations would help justify the choices made.
    3. Was only one type of treatment used? How would the performance vary with treatment … a quantitative evaluation of some sort.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. Treatment response prediction is an important step to patient selection for treatment planning as well as drug trials. Promising results in this direction.
    2. Results are on a small dataset which is typical in medical image problems
    3. 3D images are used as input as compared to 2D thus incorporating more local and global info.
  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    7

  • [Post rebuttal] Please justify your decision

    No change based on the rebuttal … promising results on the prediction from baseline. Further work on the interpretability is missing.



Review #2

  • Please describe the contribution of the paper

    Authors have proposed a method to predict and assess treatment response using pre-treatment and post-treatment CT scans and deep learning. The problem is of clinical importance and the method achieved a good performance

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Clinically relevant problem
    2. Well-written
    3. Proposed completely automatic system can avoid the variability caused by manual segmentation
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. No clinical or demographic information of the cohort is provided.
    2. PRODIGE20 dataset is for aged patients (~75 years). It can create a bias in the model. I am wondering how the model will work for the young generation
    3. It seems authors have designed a classifier to predict 4 classes. In that case, it is not clear what the ROC curve is providing. Moreover, it will be interesting to see the results for each class.
    4. It is not clear why the method is working so well. Which features were extracted using deep learning!!
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    No issue

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    see no. 5

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is clinically relevant as well as technically novel.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    This paper proposed a model to assess the treatment response from pre- and post-treatment 3D CT volumes. The same network architecture was also used for predicting treatment response from pre-treatment scans only. The models were trained/tested on a dataset of 102 patients with liver metastasis treated by chemotherapy.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Applied deep learning model on longitudinal data to predict chemotherapy response.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • There’s limited innovation in the model design. The feature extraction model is based on ResNeXt architecture with slight change of adding skip connections.
    • The size of the dataset is limited. No cross validation was conducted.
    • The clinical value of treatment response assessment model is unclear.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The model was trained and tested on a specific dataset PRODIGE20, which was collected in a previously published study. All cases were all colorectal cancer patients with liver metastasis and received specific treatments. So it is difficult for readers to validate the performance of the model using other datasets.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    • Given the size of the data is limited, please consider k-fold cross validation (or additional testing on independent data). All the data can be fully utilized through cross validation. Also, more evaluation metrics can be obtained to check the stability of the model across different fold.
    • It is not very clear how CT pairs were split into train/validation/test. Each patient can have multiple CT pairs so there might be label leakage if the train and test have images from the same patient. Please clarify how dataset was split.
    • How many pre-treatment CT scans available for each patient? If there’s only one, the total number of cases for the prediction model is only 102 (not 400). Please clarify the data difference for the two tasks.
    • How was the liver metastasis segmentation evaluated? Did the segmentation performance impact the outcome assessment and prediction?
    • Did the TRA and TRP models share the parameters in the ResNeXt module? Or they were trained separately?
    • How was the sensitivity/specificity defined for the four-way classification problem?
    • I can see the potential value of the treatment response prediction model, but it is unclear the benefits of the treatment response assessment model for clinical application. If the goal is to alleviate the burden of manual liver metastasis segmentation, the segmentation model has already achieved this goal. As mentioned by the authors in the paper, RECIST is just a unidimensional assessment of the lesion based on human annotation. So some simple measurement of the lesion contours would be enough if the segmentation is good.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The results presented in the paper were not fully validated because 1) the data size is limited and no cross validation was used; 2) there might be label leakage for the treatment response assessment model if multiple CT pairs were acquired for one patient.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Major strengths of this work include development and evaluation of a novel 3D Siamese network for assessment and prediction of treatment response in colorectal cancer liver mets patients using a clinical trial cohort, with detailed ablation studies and systematic evaluation of individual modules. Weaknesses include potential overlap with existing work, issue relating to data splitting between training/testing/validation, use of RECIST alone as an end-point (rather than path or survival),

    Rebuttal must address:

    • That scans from the same patient does not fall into training/validation/test, this would be a significant issue if so
    • Differentiation from Jin et al in terms of architechture
    • Performance of the segmentation model
    • Sensitivity/specificity classification for multi-class setup (was it one vs all or otherwise)
  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6




Author Feedback

We thank the meta-reviewer [MR] and all reviewers [R] for the relevant suggestions provided to improve our paper. We give below a point-by-point response to major critiques. 1- [MR, R3] Scans from the same patient do not fall into training/validation/test? | The splitting process between train / val / test data with 60 - 15 - 25% ratio has been properly done such that: 1- CT scans from the same patients do not belong to the same subset, 2- the class distribution is almost the same. 2- [MR, R1] Differentiation from Jin et al in terms of architecture? | Since we do not have any ground truth segmentation masks on the PRODIGE20 dataset, our network is dedicated to one single task only (response assessment or prediction) whereas Jin et al. address both segmentation and response prediction tasks simultaneously. In addition, the main architectural differences between our model and the one from Jin et al. is the use of: 1- ResNeXt blocks involving squeeze-and-excitation mechanisms, 2- short skip connexions additionally to standard long ones, 3- successive Gated Recurrent Units (GRU) for feature fusion purposes. 3- [MR, R3] Performance of the segmentation model. | The employed segmentation model won the liver CT segmentation task of the CHAOS challenge and is fully described in [18]. Since we do not have ground truth masks on PRODIGE20, the model was trained on an in-house dataset and blindy applied on PRODIGE20 data. Quantitative assessment was therefore not possible but an experienced clinician (>15 years of experience) validated all metastasis delineations through visual checking. According to Tab. 1 (TRA vs model C), integrating predicted segmentation masks additionally to source images as inputs of the network allows for an improvement of AUC from 92.34 to 95.56. 4- [MR, R2, R3] Sensitivity/specificity classification for multi-class setup? | We followed a one-vs-rest approach. Thus, we considered the 4 different binary classifiers by calculating the sensitivity/specificity scores for each of them before averaging the results to get the final performance. 5- [MR, R3] RECIST alone as end-point (rather than path or survival)? | In the assessment setting (TRA) and additionally to the segmentation tasks, the benefits of our method is to alleviate the burden of the RECIST evaluation since RECIST does not only consists of unidimensional assessment of lesions (simple task) but also looks at potential new lesions which is more tedious. In the prediction scenario (TRP), our contributions are a first step towards a complete chemotherapy regimen recommendation system able to indicate the best treatment for each patient, given the pre-treatment CT scan only. 6- [R2] It is not clear why the method is working so well. Which features were extracted using deep learning? | To understand more deeply the reached performance, we applied GradCAM to see what the model is focusing on to make the decision (visual results have been added as supplementary materials). Our experiment showed that the model has learned to focus on the region of the largest metastasis cluster inside the liver. The size of the primary metastasis cluster and darker regions (necrotic tissues) seemed to play a significant role in producing higher levels of activation. 7- [R3] Did the TRA and TRP models share the parameters in the ResNeXt module? | TRA and TRP models do not share parameters since TRP aims at performing an early prediction from the first CT only. 8 - [R3] How many pre-treatments CT scans are available for each patient? If there’s only one, the total number of cases for the prediction model is only 102 (not 400). | A total of 400 consecutive CT scan pairs from 102 patients were considered for our experiments. For a given pair of consecutive CT scans, the first one is referred to as pre-treatment CT since patients undergo successive chemotherapy courses. The two tasks (TRA, TRP) thus employ the same data: 400 CT scan pairs for TRA, 400 pre-treatment CT scans for TRP./




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Major comments raised on the initial review have been sufficiently addressed. Authors are urged clarify these points in the final published conference paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper got high scores from 2 out of 3 reviewers. Reviewer #3’s main concerns were focussed on small size and specific nature of dataset – this is a reasonable comment but at present there are very few such datasets available and the enthusiasm of the other 2 reviewers counterbalances this. The rebuttal did a good job of addressing all of the meta-reviewers concerns.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    3



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The work presents a novel predictive AI-model addressing an important clinical application. I find that the rebuttal addressed the major concerns of R3 on the patient split and segmentation model, and find the experimental setup to be sufficient for this preliminary evaluation, showing the contribution from both the application and the methodolfy contributions.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6



back to top