Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Numan Saeed, Ikboljon Sobirov, Roba Al Majzoub, Mohammad Yaqub

Abstract

When oncologists estimate cancer patient survival, they rely on multimodal data. Even though some multimodal deep learning methods have been proposed in the literature, the majority rely on having two or more independent networks that share knowledge at a later stage in the overall model. On the other hand, oncologists do not do this in their analysis but rather fuse the information in their brain from multiple sources such as medical images and patient history. This work proposes a deep learning method that mimics oncologists’ analytical behavior when quantifying cancer and estimating patient survival. We propose TMSS, an end-to-end Transformer based Multimodal network for Segmentation and Survival predication that leverages the superiority of transformers that lies in their abilities to handle different modalities. The model was trained and validated for segmentation and prognosis tasks on the training dataset from the HEad & NeCK TumOR segmentation and the outcome prediction in PET/CT images challenge (HECKTOR). We show that the proposed prognostic model significantly outperforms state-of-the-art methods with a concordance index of 0.763 ± 0.14 while achieving a comparable dice score of 0.772 ± 0.030 to a standalone segmentation model. TMSS implementation code will be publicly available upon acceptance.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16449-1_31

SharedIt: https://rdcu.be/cVRW1

Link to the code repository

https://github.com/ikboljon/tmss_miccai

Link to the dataset(s)

https://www.aicrowd.com/clef_tasks/42/task_dataset_files?challenge_id=774


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes a transformer network for head & neck tumor segmentation and outcome prediction challenge using PET-CT and EHR data.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. SOTA results on survival prediction
    2. Complementary segmentation which can potentially be used for RECIST computation etc.
    3. Combining multimodal data.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Images are cropped to 80x80x48. This potentially helps as a sort of attention/mask mechanism where the tumor region is now in focus. Using the entire image downsampled to the smaller size (for easier training) would have been fairer.
    2. No ablation studies on the patch size, contribution of modalities etc has been done.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Reproducible

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. Hausdorff distance can also be included to evaluate the secondary output of segmentation.
    2. Ablation studies to show the effect of patch size.
    3. Ablation to test the effect of using only PET-CT image (minus EHR).
    4. Range for intensity normalization can be included i.e. was the whole CT HU window used to rescale to 0-1 or similar specifics.
    5. Were any other EHR records identified (besides smoking and alcohol) to have missing data and was any imputation used for the same?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. The cropping approach which appears to use some domain knowledge that has not been clearly defined.
    2. Multimodal approach that achieves SOTA performance.
  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    Rebuttal did not satisfactorily answer the previous concerns namely the absence of proper cross-validation (though there is a valid hold-out validation). cropping and additional metrics for evaluation.



Review #2

  • Please describe the contribution of the paper

    An End-to-End Transformer-based Multimodal Network for Segmentation and Survival Prediction of Head and Neck Cancer. The problem is of clinical importance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The problem is of clinical importance.
    2. Although Transformer-based Multimodal Network has been used to solve other problems, it is used for the first time for head and neck cancer segmentaion and survival prediction
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. It would good to provide more results.
    2. The hypothesis behind using the network is not clear
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    1. “An analysis of situations in which the method failed” is missing
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    See the wekness section for details

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The application has clinical impact. The method outperformed existing works. Analysis is elaborate

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    The paper proposes the use of a multimodal transformer network to combine the tasks segmentation and survival prediction in order to achieve an improved performance for the survival prediction and a competitive performance on the segmentation task. The model is evaluated on the HECKTOR dataset and achieves superior performance in comparison to state of the art algorithms.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Overall the paper is very well-written, easy to follow and soundly evaluated.
    • The manuscript clearly states its contributions and has a strong focus, adequately leading through the overall concept.
    • The authors provide a good overview on the current state of the art and demonstrate a profound knowledge in the field.
    • The authors have agreed to disclose their code on acceptance.
    • Finally, the authors achieve state of the art performance.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The authors have chosen to use a cross-validation and additionally conducted a metaparameter optimization. In this combination, however, the authors might have partly done a circular analysis.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The reproducibility checklist well matches the contribution. I would like to argue, however, that significance analyses (“Not applicable”) are well applicable, and thus, that the depiction of measures of confidence would have been adequate.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Evaluation

    • As to my understanding, the authors have chosen to use a 5-fold cross-validation and have further optimized their parameters using the OPTUNA framework. – In this combination, the authors might have partly done a circular analysis, namely they might have found parameters which perform well in this specific setting. – I would recommend to instead initially split off a hold-out set, which then the authors conduct their final analysis on, as only this would rule out the opportunity of statistical dependence of the model training and the later testing phase, makes the results more reliable and thus creates a clearer incentive for others to use the work later on, which would be in the authors’ as well as in the community’s interest.

    • Further, I was not able to find any measure of confidence: – As the authors have conducted a cross validation, depicting the variances across different folds would have created a first impression on the confidence of the achieved results. – If the authors aim for a statistically more sound evaluation, I would recommend to also employ methods such as bootstrapping [1] and statistical testing, which would make it easier for the community to understand the value of the proposed method.

    Minor

    • I would like to recommend to make the text in Fig. 1 a bit larger to facilitate reading.
    • The table formatting in Tab. 1 felt somewhat old-fashioned. I would like to recommend [2] for this.

    References

    [1] Efron, B. (1987). Better bootstrap confidence intervals. Journal of the American statistical Association, 82(397), 171-185. [2] https://people.inf.ethz.ch/markusp/teaching/guides/guide-tables.pdf

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    8

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • The combination of transformer networks for simultaneously predicting lesion segmentations and survival estimates in order to achieve an improved estimate, and under the combination with EHR data to the best of my knowledge is unprecedented and has a large potential for a strong impact.
    • Further, the paper is very well-written and easy to understand.
  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper presents a method for head-and-neck cancer prognosis prediction from PET/CT and EHR. The Vision Transformer (ViT) used in the encoder, and the segmentation branch as another loss function are the two major technique contributions. These two modules are not new in medical image analysis tasks. ViT is likely the first time being used in the imaging+EHR cancer prognosis task, but the motivation is unclear. There also lacks an ablation study to show the contributions of these two modules, and the training data size is likely too small for the transformer to achieve robust performance. And importantly, the improved result is obtained by cross-validation, which may still have an overfitting risk. To better demonstrate the generalization ability of the proposed method, an independent test is needed, e.g., obtaining results on the challenge’s testing set. Also, some important baseline approaches are missing, e.g., radiomics/CNN+ clinical features in a well-designed feature selection and statistical learning framework. In addition, statistical analysis is needed for survival task research, e.g., KM analysis.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    8




Author Feedback

We thank the reviewers for the constructive and positive feedback. The common concerns raised by the reviewers will be addressed collectively.

Motivation for the use of ViT in prognosis:
The primary motivation behind using a transformer-based encoder is its ability to jointly learn from both imaging and clinical data, unlike CNN+EHR-based counterparts in the SOTA approach [21] or similar papers [15]. This method mimics oncologists’ approach to simultaneously analyze multimodal data, enabling the model to have inter-modality attention. Although the modules are not new in medical imaging tasks, it is the first time they are used in conjunction for segmentation and prognosis by proposing a combined loss function to train an end-to-end DL model. The proposed loss function allows the 2 tasks to interchangeably learn from one another to enhance the model predictions.

Contribution:
The model’s ability to perform both segmentation and risk prediction tasks simultaneously in an end-to-end fashion is one of the 3 significant contributions of the paper. Another contribution is its ability to encode EHR and imaging data into the same network. Also, the combined loss function enables the model to concurrently learn features necessary for risk prediction and segmentation, making the model more robust and generalizable, which helps in cases of fewer data samples but multiple labels.

Overfitting:
A clarification is needed on using k-fold cross-validation and hyperparameter tuning. We optimized the hyperparameter using only one of the folds and then performed k-fold cross-validation using the entire dataset due to the small sample size. However, there is a chance of leakage due to that specific fold; therefore, we redid the testing using a hold-out test set. To rule out the statistical dependence of the model training, we split the dataset randomly into two subsets, train and test, with an 80% and 20% ratio, respectively. The model hyperparameters were optimized using a small subset of the training set and tested using the hold-out set. We got a C-index score of 0.74 and a DSC score of 0.76 on the testing set. The prognosis task score on hold-out set is slightly lower than the k-fold cross-validation score of 0.76, but it is more reliable and greater than the previous best scores of 0.70.

To assess the performance of our approach w.r.t other DL models, we compare it to SOTA in H&N cancer prognosis [21], an ensemble of CNN+EHR and statistical model for survival analysis and show that our model achieves a higher CI. However, we can’t verify these results on the challenge testing set since the ground truth is withheld for competition purposes. Furthermore, the challenge platform is currently not accepting any submissions. Therefore, we compared our segmentation results to a recent work [22] that used a similar segmentation model on the same dataset and obtained results on the test set. The results we obtained were comparable to theirs. This verification on the withheld testing set also indicates that the model can learn and not overfit with the same data. With a base ViT encoder, we showed an intuitive and elegant approach to two tasks at hand, yet more recent and robust encoders such as Swin Transformers can be studied.

Minor Changes:
Since our model performance was validated using k-fold cross-validation, both mean and standard deviation were reported in the results table. As suggested by the reviewers, we made minor changes to the figure and table and added information regarding our model’s weaknesses to the discussion section. CT & PET normalization steps, the EHR features that were dropped due to missing data and the future work part have been updated in the paper.

Ablation Studies:
The suggested ablation studies on patch size, the contribution of modalities, the use of recently proposed ViT architectures (i.e. Swin Transformers), etc., are a future direction for us and others. We will add this to the conclusion.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Some of the major concerns are not solidly addressed, e.g., unclear ablation contributions of ViT and segmentation branches; mimic oncologists’ analyzing multimodal data by transformer encoder is not convincing enough; transformer produces high performance on small data; the comparison with the previous method is not very solid, etc. In summary, an integrated prediction model combining (limited number of) clinical features and image data and trained from a limited number of patients, is difficult to obtain a significant performance gain compared to well-designed simpler baseline models; external validation is very important, but is missing.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    10



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This work proposes an end-to-end transformer-based network for cancer segmentation and risk prediction. Given the complexity of the framework, most of reviewers concerns are on the validation strategy, and the lack of an ablation study. These concerns are only partially addressed in the rebuttal. Overall, the work seems to deserve a proper major revision to properly present the required experimental validation, and therefore cannot be accepted to the conference in the current form.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    20



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This is a very borderline accept. The rebuttal partially addresses some concerns raised, and everyone agrees that the proposed method is novel in its development of a transformer encoder to jointly learn from EHR and imaging. The limited validation is a concern, which is somewhat explained in the rebuttal. The ablation study is a concern, but hopefully will be addressed in an expanded version of this paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    9



Meta-review #4

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    AC recommendations on this paper were split with a majority vote of “rejection”, while the reviewers expressed consensus in supporting acceptance even after rebuttal. The PCs thus assessed the paper reviews, meta-reviews, the rebuttal, and the submission. Although there was lingering issues that can be addressed, the paper has merits in clinical relevance, novel application, and SOTA performance which were highly appreciated by the reviewers. The PCs agree with the convincing arguments of the reviewers and AC, and thus the final decision of the paper is accept.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    NR



back to top