Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Can Cui, Han Liu, Quan Liu, Ruining Deng, Zuhayr Asad, Yaohong Wang, Shilin Zhao, Haichun Yang, Bennett A. Landman, Yuankai Huo

Abstract

Integrating cross-department multi-modal data (e.g., radiology, pathology, genomic, and demographic data) is ubiquitous in brain cancer diagnosis and survival prediction. To date, such an integration is typically conducted by human physicians (and panels of experts), which can be subjective and semi-quantitative. Recent advances in multi-modal deep learning, however, have opened a door to leverage such a process in a more objective and quantitative manner. Unfortunately, the prior arts of using four modalities on brain cancer survival prediction are limited by a “complete modalities” setting (i.e., with all modalities available). Thus, there are still open questions on how to effectively predict brain cancer survival from incomplete radiology, pathology, genomic, and demographic data (e.g., one or more modalities might not be collected for a patient). For instance, should we use both complete and incomplete data, and more importantly, how do we use such data? To answer the preceding questions, we generalize the multi-modal learning on cross-department multi-modal data to a missing data setting. Our contribution is three-fold: 1) We introduce a multi-modal learning with missing data (MMD) pipeline with competitive performance and less hardware consumption; 2) We extend multi-modal learning on radiology, pathology, genomic, and demographic data into missing data scenarios; 3) A large-scale public dataset (with 962 patients) is collected to systematically evaluate glioma tumor survival prediction using four modalities. The proposed method improved the C-index of survival prediction from 0.7624 to 0.8053.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16443-9_60

SharedIt: https://rdcu.be/cVRze

Link to the code repository

https://github.com/cuicathy/MMD_SurvivalPrediction.git

Link to the dataset(s)

https://www.med.upenn.edu/cbica/brats2020/data.html

https://github.com/mahmoodlab/PathomicFusion/tree/master/data/TCGA_GBMLGG

https://wiki.cancerimagingarchive.net/display/Public/TCGA-LGG

https://wiki.cancerimagingarchive.net/display/Public/TCGA-GBM


Reviews

Review #1

  • Please describe the contribution of the paper

    This work introduces a two-stage deep learning approach that integrates multiple data modalities in the presence of missing data. In addition, the authors performed a detailed ablation and comparison study to justify the importance of each module, and the improvements over baselines, respectively.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The significant contribution of this work is that it provides a comprehensive study of multimodal fusion for survival prediction. It gives a detailed guideline for the readers to integrate multiple data modalities in the presence of missing data. In addition, the modeling components are straightforward and well established in the field, which makes this model generalizable for solving other problems.

    The model contains two key components, the first one is a uni-modal feature extractor module, and the second one is a fusion module. The unimodal feature extractor module learns a lower-dimensional representation of the input data, and the fusion module combines the learned features and extract discriminative patterns which improve survival prediction.

    The authors use well-established models to build the unimodal feature extractor module. This allows them to use pre-trained weight, which reduces computational complexity. In addition, due to this choice, the authors can combine the power of multiple state-of-the-art models already proven to be robust for representing imaging, genomics and demographic data.

    The fusion module contains a two-layer MLP that further projects the output of the unimodal feature to a latent space where a mean vector is calculated for data fusion. The authors take a unique approach to dropout modalities, enabling the model to learn discriminative patterns from the rest of the data when a modality is absent. This approach boosts their performance, and they can take advantage of a larger dataset.

    The authors have provided a detailed study explaining their choice of optimization strategy. They compared their model by training on complete data. They showed that the model could improve performance by extracting discriminative patterns from multiple data modalities in the presence of missing data.

    In an ablations study they show the importance of data fusion compare to uni-modal approaches.

    Finally, the model shows improved performance compared to the baselines.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The authors use a two-layer decoder to reconstruct the embedding learned by unimodal feature extractors. Performing that in two separate stages makes sense; however, during training in an end-to-end fashion, making the decoder learn projection of a previous layer is a non-trivial choice. In such scenarios, the model could learn random projection that does not contain the input’s formation. Usually, decoders are used to reconstruct back the whole data. It would be helpful if the authors could provide further clarification on this. This could be the reason for the poor performance of the end-to-end approach.

    The authors fix the hyperparameter \lambda to 1. They argued that it is empirically selected to balance the cox loss and the reconstruction loss. Does this mean L_cox and L_recon lie in the same range? If not, this choice is not natural and should be fixed using cross-validation.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors have provided necessary information for reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    This work introduces a multimodal fusion model built on top of a state-of-the-art deep learning model for improving survival predictions.

    The model architecture and the optimization strategy are well motivated. The authors have provided all the necessary justification for each component.

    It will be very helpful for the reader if the authors could explain their reasoning behind reconstructing the embedding of the data instead of the data itself.

    The authors could perform cross-validation to select the best hyperparameter. This could improve their performance.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This work introduces a deep learning model for fusing multimodal data along with a comprehensive guideline to train it. The simplistic architecture and the easy training strategy may prove helpful for other researchers to train this model. In addition, the authors did a commendable job quantifying the importance of each of the modules.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The manuscript introduces an approach to predict patient survival based on multi-modality data which includes MRI scans from 4 sequences, histology images, genomics data, as well as demographic data. Three different multi-modality data fusion methods are compared, in which modality dropout was introduced to simulate the scenarios when certain data modalities were missing. Also assessed were whether reconstructing the missing modality could improve the prediction or not. Experiments were carried out on a combined public dataset. An ablation study was conducted to evaluate the effect of all the features introduced to the model.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Survival prediction based on multi-modality dataset is a meaningful problem in clinical research, and handling dataset with missing data in different modalities is a very realistic problem.
    2. The comparison between end-to-end and two-stage training strategies and the ablation study on data fusion strategies are conducted systematically.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. More details are needed for the extraction of radiology and histology image features.
    2. More discussions could be conducted on the relationship between information provided by different modalities.
    3. The introduction of embedding reconstruction seems to bring marginal value to the prediction.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors specify that code and data will be publicly available

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. The authors are providing a solution to a highly meaningful and practical problem in clinical research, which is learning features from multi-modality dataset under the condition that data points from each modality could be missing on subjects due to various reasons. Providing solution to such a problem would help researchers take advantage of data that used to be discarded.
    2. Since many features are added into the comparison, more analysis should be needed to identify which features are the most important for improving the performance of the prediction model. For example, training with modality dropout seems to bring substantial difference when compared with the corresponding models without dropout. Such factors need to be highlighted and supported with statistical tests.
    3. The extraction of radiology image features is described with limited details. For instance, it is unknown how the segmentation of the 3D tumor volume was conducted. Also the extraction of 2D features seems arbitrary. Overall this step could result in over-parameterized model that is prone to over-fitting.
    4. The extraction of pathology image features is described without necessary details. Since the cited method would generated three types of features which are from CNN, GCN, or the combination of the two networks, the readers would need more information to know which way to follow if they would like to replicate the work.
    5. The introduction of the reconstruction following modality dropout seems to bring very limited value in the prediction, which to some degree is expected since the reconstruction is based on existing features from other modalities which doesn’t bring in additional information to the combined model. More work may be needed to justify the introduction of this feature.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper addresses the issue of handling missing data that is a very realistic problem faced in clinical applications

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    This paper tackles the problem of effectively utilizing multi-modality data in the presence of missing modalities. In particular, the authors ask the question of how to effectively predict survival in glioma patients when some modalities might be missing for patients. This is a common setting in real life. To solve this problem, this work proposes to aggregate individual modality feature embeddings using the mean vector which can be decoded to obtain individual modalities. The resulting mean vectors is used for predicting survival.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The problem tackled here is both interesting and challenging. Many of the recent methods which try to work with multi-modality data tend to assume that the information of all the modalities of interest are available for all the patients. This makes the task easier, but not applicable to all patients in clinical settings. When ground truth information of some modalities are missing, it is an interesting problem to somehow use the information from the patient to extract relevant information for the training. The paper also summarizes the previous works neatly, enabling us to clearly understand the contribution of this work.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper claims to completely solve 4 important questions in multi-modality fusion with incomplete data. This is a very large claim that needs to be stated with caution, with assumptions and limitations. This paper only works on a limited glioma dataset. Thus, any claims should be restricted to this setting. The optimal setting of the multi-modality fusion network discovered by the authors is for a particular set of setups which the authors consider - i.e. where mean vector fusion is used, along with autoencoders for helping with generating informative embeddings.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors provide some information about the dataset used, and the experimental settings. The exact details of the split generation are missing. Overall the work looks to be reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    There are some method descriptions which can be made clear

    • What is modality dropout? Why was it used in the previous works? How are the modalities dropped?
    • In what way is your method “optimal”?
    • Why do you use radiomics features in addition to the ResNet features? How does the performance change if you do not use the radiomics features?
    • What is the source of variation in the results in Table 1? Are there different splits being considered here? The table description can be made clearer.
    • How are the single modalities reconstructed from the mean vector embeddings? What is the architectural design of the model used for this? What is the decoder structure?
    • Do the experiments clearly demonstrate your answer for Q3? To me, it looks like both strategies perform somewhat similarly and more experimental validation is necessary to clearly answer the question.

    There are some typos/grammatical mistakes.

    • All the experiments were ran on a -> All the experiments were run on a
    • Chen et al chen2020pathomic should be a reference
    • Some prior studies already shown -> Some prior studies “have” already shown
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is overall well-written, with a good setup and a good level of experimentation. The claims of the paper seem to be overstated.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper seeks to address the problem of utilizing multi-modality data in the presence of missing modalities. Specifically the authors seek to address the challenge of predicting Glioma survival by combining imaging, histopathology and omics datasets with missing modalities. The work proposes to aggregate individual modality feature embeddings using the mean vector which can be decoded to obtain individual modalities. The resulting mean vector is used for predicting survival. The paper is well written and has merit. However, a few concerns were raised by the reviewers regarding experiemental design which could be addressed in the final submission.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6




Author Feedback

We sincerely thank the meta reviewer and all reviewers for their time and recognition of our paper. Here, we address the concerns and questions from the reviewers.

  1. Lack of details for pathological and radiological image feature extraction (Reviewer #2 and Reviewer #3). The reviewer R2 and R3 asked why the radiomics features were used in addition to the ResNet features for radiological images and why did not we use the 3D feature? First, the unimodal model (radiological images only) achieved significantly better (p-value < 0.05) performance compared when using both radiomics and ResNet features compared with the ResNet only case. Second, 2D models typically yield the smaller GPU memory consumption, faster efficiency, and less prone to overfit compared with the 3D counterparts for lesion characterization. We agree with the reviewer that a 3D feature extractor would be an interesting future work. R2 asked that why the GCN method in the cited paper [12] was not used? In [12], GCN required additional preprocessing but with limited improvement. Thus, we did not include GCN in our method.

  2. Contribution of network features (reconstruction and modality dropout) (Reviewer #2). R2 questioned about the usefulness of deploying reconstruction and modality dropout.
    To address this concern, more ablation study results are presented beyond Table 3. The statistical paired t-test showed that there were significant improvements (p-value < 0.05) of adding reconstruction in the pathology missing (mean c-index 0.7808 vs 0.7702). and gene & pathology missing scenarios (mean c-index score 0.7451 vs 07373) for mean vector fusion using all data. Meanwhile, the experiment with modality dropout outperformed the ones without modality dropout (both without reconstruction) with significant differences (p-value < 0.05) in the modality complete (mean c-index score 0.7833 vs 0.7717) and pathology missing situations (mean c-index score 0.7702 vs 0.7622) b. Suggested by R2, we conducted an additional experiment that used modality dropout but only reconstructing the not-dropped modalities. This approach got slightly lower c-index values than the best model in Table 3 (reconstructed the dropped and not-dropped modalities) for testing data with completed modalities, pathology missing, and gene & pathology missing separately (mean c-index score 0.7853, 0.7768 and 0.7331 vs 0.7857, 0.7808 and 0.7451).

  3. Why two-stage learning strategy is preferred? (Review #3) In this work, the two-stage learning strategy achieved the comparable prediction performance with less memory consumption and yielded more flexible design than the end-to-end counterpart.

  4. Explain their reasoning behind reconstructing the embedding of the data instead of the data itself. (Review #1) We agree with R1 that reconstructing the data itself might further improve the performance, especially for the end-to-end strategy. However, a larger network would need a larger memory consumption, which is not always wanted or feasible. We agree it is an interesting direction to explore in the future.

About other comments for more details on the implementation of proposed and baseline methods, writing typos, and some overstated claims, we will add and edit them in the final submission version. The generalization ability of the proposed features will be evaluated with more datasets and the hyper-parameter tuning will be considered in future work.



back to top