Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Mingyuan Meng, Lei Bi, Michael Fulham, Dagan Feng, Jinman Kim

Abstract

Survival prediction is crucial for cancer patients as it provides early prognostic information for treatment planning. Recently, deep survival models based on deep learning and medical images have shown promising performance for survival prediction. However, existing deep survival models are not well developed in utilizing multi-modality images (e.g., PET-CT) and in extracting region-specific information (e.g., the prognostic information in Primary Tumor (PT) and Metastatic Lymph Node (MLN) regions). In view of this, we propose a merging-diverging learning framework for survival prediction from multi-modality images. This framework has a merging encoder to fuse multi-modality information and a diverging decoder to extract region-specific information. In the merging encoder, we propose a Hybrid Parallel Cross-Attention (HPCA) block to effectively fuse multi-modality features via parallel convolutional layers and cross-attention transformers. In the diverging decoder, we propose a Region-specific Attention Gate (RAG) block to screen out the features related to lesion regions. Our framework is demonstrated on survival prediction from PET-CT images in Head and Neck (H&N) cancer, by designing an X-shape merging-diverging hybrid transformer network (named XSurv). Our XSurv combines the complementary information in PET and CT images and extracts the region-specific prognostic information in PT and MLN regions. Extensive experiments on the public dataset of HEad and neCK TumOR segmentation and outcome prediction challenge (HECKTOR 2022) demonstrate that our XSurv outperforms state-of-the-art survival prediction methods.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43987-2_39

SharedIt: https://rdcu.be/dnwJU

Link to the code repository

https://github.com/MungoMeng/Survival-XSurv

Link to the dataset(s)

https://hecktor.grand- challenge.org/


Reviews

Review #3

  • Please describe the contribution of the paper

    The authors have made the following contributions: • They proposed a merging-diverging learning framework for survival prediction from multi-modality images (PET-CT) that can fuse multi-modality information and extract region-specific information (e.g., the prognostic information in Primary Tumor (PT) and Metastatic Lymph Node (MLN) regions). • They proposed a Hybrid Parallel Cross-Attention (HPCA) block to effectively fuse multi-modality features via parallel convolutional layers and cross-attention transformers in the merging encoder. • They proposed a Region-specific Attention Gate (RAG) block to screen out the features related to lesion regions in the diverging decoder. • They demonstrated their XSurv on survival prediction from PET-CT images in Head and Neck (H&N) cancer, using the public dataset of HEad and neCK TumOR segmentation and outcome prediction challenge (HECKTOR 2022), and showed that it outperforms state-of-the-art survival prediction methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Some of the main strengths are: • The proposed framework can leverage both multi-modality and region-specific information for survival prediction, which are important factors for cancer prognosis. • The proposed HPCA and RAG blocks are novel and effective methods for feature fusion and selection in the merging encoder and diverging decoder, respectively. • The proposed XSurv network has an X-shape architecture that allows for efficient information flow and feature extraction from PET-CT images. • The proposed framework is validated on a large and challenging public dataset (HECKTOR 2022) and shows superior performance over existing methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Some of the possible weaknesses are: • The proposed framework may not generalize well to other types of cancer or other modalities of medical images, as it is tailored for PET-CT images in H&N cancer. • The proposed framework may require a large amount of computational resources and training time, as it involves complex and deep neural networks with multiple components. • The proposed framework may not account for other factors that may affect survival prediction, such as clinical data, genetic data, or treatment response. • The proposed framework may not provide sufficient interpretability or explainability for the survival prediction results, as it relies on black-box models and feature extraction methods.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors have chosen to make their code and other relevant details publicly available. This suggests that the results are replicable and verifiable.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Some of the possible constructive comments to improve the manuscript are: • The authors should provide more details on the design and implementation of the proposed HPCA and RAG blocks, such as the number and size of the convolutional layers and transformers, the attention mechanisms, and the loss functions. • The authors should conduct a few more ablation studies to show the contribution of each component of the proposed framework, such as the merging encoder, the diverging decoder, the HPCA block, and the RAG block. • The authors should evaluate their proposed framework on other types of cancer or other modalities of medical images, such as MRI or ultrasound, to demonstrate its generalizability and applicability. • The authors should provide some qualitative analysis or visualization of the survival prediction results, such as showing the feature maps or attention maps of the proposed framework, or highlighting the regions or features that are most relevant for survival prediction.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors propose a novel framework for survival prediction in head and neck cancer using multi-modality images and region-specific information. However, the paper lacks sufficient technical details and theoretical analysis for the proposed HPCA and RAG blocks, which are the key components of the framework. Moreover, the paper does not discuss the trade-off between the performance gain and the computational cost of adding these blocks, which may increase the number of parameters and the training time of the framework.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #1

  • Please describe the contribution of the paper

    This paper presents a novel approach using a merging-diverging transformer for survival prediction in Head and Neck cancer. This approach combines the two modalities (PET and CT) fused in the encoder with cross-attention blocks and extract region features using specific attention gates in the decoder. It is multi-task as it outputs PET and CT lesion masks as well as DL score that is used by a secondary survival model (Cox). For that matter, the Hecktor dataset is used for training and the approach is compared with other sota networks. The results are promising, the network presented (XSurv) performs better than the others. Ablation studies are also presented to validate the XSurv architecture.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Redaction

    Paper is clear and very well written.

    Model derivation and performances

    The XSurv model is an improvement over existing methods. The novel contributions (hybrid cross-attention blocks, region-specific attention gates) are interesting and may be incorporated in other contexts and architectures.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There is no real weakness in the paper besides some points that may be worthy of clarification (see detailed comments in section 9).

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Reproducibility is high: code will be released on GitHub, the method, parameters are carefully described and dataset is publicly available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    I have some minor comments:

    • batch size seems fairly small (=2). Is there any issue with batch normalization.
    • the 5 fold CV could have been used to have an estimation of the variability of the performance on the test set.
    • it is not essential for the paper but more details could be given on the combination with radiomics (at least in supplementary material: extraction parameters, feature selection…) for reader’s convenience and reproducibility.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This work is a significant improvement over sota networks for survival prediction in head and neck cancer. It introduces some new cross-validation blocks and region-specific attention gates that could benefit other works beyond head and neck cancer. It is very well written and reproducibility of this work is high.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This work presents a deep learning method for survival prediction in PET/CT images of head and neck cancer. A complex but sound convolutional- and transformer-based merging-diverging learning framework is proposed with a merging encoder to fuse multi-modality information and a diverging decoder to extract region-specific information.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Multiple technical contributions are proposed: Hybrid Parallel Cross-Attention (HPCA) block for multi-modality feature learning, Region-specific Attention Gate (RAG) block for region-specific feature extraction, Other components are not necessaril novel but well integrated in the framework, including a combination of the deep model with hand-crafted radiomics features, multi-task learning with the segmentation of tumors as auxiliary tasks. The method seems sound and robust. It is well described and the different components are well motivated.

    Excellent results are obtained in the survival prediction and segmentation of primary tumor and lymph nodes.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Results on the HECKTOR test set should be reported. Additional information should be provided, including the statistical test, the computational time. More details below.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The code will be made public and the dataset is public.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. Results on the HECKTOR test set should be reported (maybe only the method that performed best on the current test). It would allow comparison with all other methods on the HECKTOR 2022 challenge.

    2. I did not find information on the statistical test that was performed for Table 1. For the C-index, how do you compare a single value for each method? Did you use bootstrap or something else ?

    3. There is no information on the computational cost (training time etc.)

    4. Other multi-task loss balancing methods could be tried instead of fixed weights.

    5. “existing deep survival models cannot effectively leverage complementary multi-modality information,” This is not particularly a problem of deep survival models since most of them are based on standard architectures with survival losses?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is good, with novelty and strong results. The weaknesses should be easily fixed, in which case I would recommend acceptance.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    A technically sound multi-modal multi-task network is designed in an application-oriented fashion for survival time prediction in H&N cancer with PET-CT images. The authors clearly summarized the limitations of existing approaches to present their research motivation. The proposed method was evaluated on a public benchmark, leading to SOTA performance. All reviewers consistently affirm the novelty of the paper, although they still have some concerns regarding the implementation details (e.g., computational complexity, and the effectiveness of BN given a very small batch size).




Author Feedback

We thank the Meta-Reviewer (MR) and Reviewers (R1-R3) for their appreciation of our paper. We provide our responses (Re) to the comments below: 1 (MR, R1): Batch Normalization (BN) with a small batch size of 2. Re: Our empirical experiments showed that BN is beneficial for our method even with a batch size of 2. To study the benefits of BN, we attempted to replace BN with instance normalization during 5-fold cross-validation and observed performance degradation in both survival prediction and segmentation. Note that we choose the batch size of 2 due to the limited GPU memory resource (a 12 GB Titan X GPU); we suggest that if GPU memory is sufficient, a larger batch size (>2) may further exploit the benefits of BN and enable better performance.

2 (R2): Results on the HECKTOR test set. Re: As the ground truth labels of the HECKTOR 2022 testing dataset were not released to the public, we only adopted the training dataset of HECKTOR 2022 and split it into the training/testing sets (as stated in Section 3.1). For a fair comparison, all comparison methods were implemented by running their official open-source codes under our data-split settings. We have also included the top-2 methods in the HECKTOR 2022 (ICARE and Radio-DeepMTS) for comparison.

3 (R2): How to compare C-index in the statistical test? Re: We used the compareC package in R language to compare the C-index of different methods. From the prediction outputs of two comparison methods, the compareC has a built-in resampling mechanism (similar to bootstrap) to calculate the C-index and P value.

4 (MR, R2, R3): Details about computational cost. Re: In our experiments, one training iteration (including real-time data augmentation) took approximately 4.2s, and one patient’s inference took approximately 0.61s. We will add these details in the camera-ready version.

5 (R3): Not generalize well to other cancers or image modalities. Re: In this study, we focused on evaluating the capabilities of the proposed method with the well-benchmarked HECKTOR challenge dataset. However, we suggest that our technical contributions are not bound to PET-CT or H&N cancer. The proposed merging-diverging framework, HPCA block, and RAG blocks were designed for general purposes of multi-modality feature fusion and region-specific feature extraction. As future studies, we will evaluate the capabilities of the proposed method for other multi-modality images (e.g., PET-MRI) and disease types (e.g., gliomas).

6 (R3): Not consider other prognostic factors, such as clinical and genetic data. Re: Clinical data (including the clinical/treatment indicators in Table S3 of the supplementary materials) were taken into consideration via radiomics analysis (i.e., radiomics enhancement in Section 2.3). When not using clinical data, the C-index of Radio-XSurv degraded from 0.798 to 0.789. The genetic data is unavailable for the HECKTOR dataset, and we might explore it using other datasets in our future study.

7 (R3): More implementation details (e.g., the number and size of the convolutional layers and transformers, and the loss functions); Need ablation studies to analyse the encoder, decoder, HPCA block, and RAG block; Need qualitative analysis or visualization (e.g., attention maps). Re: The required implementation details were shown in Table S1 in the supplementary materials. The required ablation studies were presented in Table 2/3 in the paper. The required visualization analysis was provided in Fig. S1 in the supplementary materials.

8: Suggestions on inappropriate statements and further extension/validation. Re: The testing variability and radiomics details (R1) will be added in the camera-ready version. The inappropriate statement (R2) will also be reworded in the camera-ready version. Other suggestions, including multi-task loss balancing (R2) and validation on other datasets (R3), will be explored in our future extension study.



back to top