Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Kexin Ding, Mu Zhou, Dimitris N. Metaxas, Shaoting Zhang

Abstract

Survival outcome assessment is challenging and inherently associated with multiple clinical factors (e.g., imaging and genomics biomarkers) in cancer. Enabling multimodal analytics promises to reveal novel predictive patterns of patient outcomes. In this study, we propose a multimodal transformer (PathOmics) integrating pathology and genomics insights into colon-related cancer survival prediction. We emphasize the unsupervised pretraining to capture the intrinsic interaction between tissue microenvironments in gigapixel whole slide images (WSIs) and a wide range of genomics data (e.g., mRNA-sequence, copy number variant, and methylation). After the multimodal knowledge aggregation in pretraining, our task-specific model finetuning could expand the scope of data utility applicable to both multi- and single-modal data (e.g., image- or genomics-only). We evaluate our approach on both TCGA colon and rectum cancer cohorts, showing that the proposed approach is competitive and outperforms state-of-the-art studies. Finally, our approach is desirable to utilize the limited number of finetuned samples towards data-efficient analytics for survival outcome prediction. The code is available at https://github.com/Cassie07/PathOmics.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43987-2_60

SharedIt: https://rdcu.be/dnwKf

Link to the code repository

https://github.com/Cassie07/PathOmics

Link to the dataset(s)

N/A

Reviews

Review #3

Please describe the contribution of the paper

The paper proposed a multimodal framework integrating pathology and genomics information for colon and recturm cancer survival prediction. The framework contains an unsupervised multimodal data fusion pretraining and a flexible modality finetuning scheme. Compared with the previous studies, the proposed method achieved consistently better performance on TCGA-COAD and TCGA-READ datasets.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The unsupervised pre-training model proposed in the paper enables good interaction between different modalities, the idea is interesting. Also, the flexible modality finetuning could extend the range of dataset usage.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The proposed method only fused the single type of genomics data with image data (e.g., mRNA/CNA/Methylation + image).
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors mentioned that the code will be publicly available.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
1. In this study, each type of genomics data is fused with image data separately. I am curious about the outcome prediction result of using all types of genomics data together with image data (e.g., mRNA + CNA + Methylation + Image) for survival outcome prediction.
2. Did the author try different similarity losses for multimodal fusion, like cosine similarity loss? The multimodal data fusion in pretraining seems highly related to the embedding similarity evaluation.
Typo: In P2 contribution paragraph, it should be “finetuning” rather than “fintuning”.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
- Clearly written paper.
- The model performance and the methodological novelty.
- Good experimental setting of main experiments and enough baseline comparisons with the previous studies.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

The key contribution of this paper is the proposal of a multimodal transformer that integrates pathology and genomics insights for survival outcome prediction in colon-related cancer. The approach combines benefits from both unsupervised pretraining and supervised finetuning data fusion, resulting in task-specific finetuning. The approach also achieves comparable performance even with fewer data used for finetuning, making it more data-efficient with limited data size. Overall, this approach promises to reveal novel predictive patterns of patient outcomes in colon-related cancer.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The authors propose a multimodal transformer that integrates pathology and genomics insights for survival outcome prediction in colon-related cancer. The approach combines benefits from both unsupervised pretraining and supervised finetuning data fusion, resulting in task-specific finetuning. The technique achieves comparable performance even when fewer data points are used for finetuning. The paper provides a strong evaluation of the proposed approach by comparing it with several state-of-the-art methods on multiple datasets and showing better performance.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

One potential limitation of the paper is that it does not provide a detailed explanation of the interpretability of the proposed multimodal transformer. While the approach achieves high performance in predicting patient outcomes, it may be difficult to understand how the model arrives at its predictions. This is an important consideration in clinical settings where interpretability and transparency are crucial for decision-making.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors provide information on the public repository for the datasets they gathered for this study as well as description of the preprocessing steps. They also state that their code will be available once their paper is published.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

There are a number of misspellings of the term “pre-training” in the text such as “pertaining” or “pretaining”. Also I would suggest the authors to consider addressing the Interpretability/Explainability aspect of their model to enable a stronger clinical application of their contribution for treatment of cancer
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper was well written overall, with a few misspellings that were repeated on multiple pages. The multimodal transformer method is not entirely novel, however the authors proposed a few aspects including unsupervised pretraining and supervised finetuning data fusion, and they have presented a large variety of experiments against baseline methods from the literature. The lack of discussion on interpretability aspects of the contribution limits the clinical utility of this work
Reviewer confidence

Somewhat confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #1

Please describe the contribution of the paper

This paper presents an approach to train a multimodal transformer model for survival prediction task and validate it on the colorectal cancer setting. Pre-training is conducted using the histopathology and genomics data to build a multimodal representation, which is then fine tuned for downstream task of survival prediction. Quantitative experiments are conducted using the TCGA datasets, and the model is shown to have better concordance index compared to existing models for survival prediction task.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Pre-training multimodal transformers with histopathology and genomics data by mapping to the common latent space is novel and is demonstrated to have superior performance compared to using either of the 2 data types
- Presented approach for multimodal pre-training and subsequent fine tuning is of significant value for multimodal genomics datasets, which are often limited a few 100 samples.
- Extensive, systematic experiments are conducted using the TCGA dataset on the task of survival outcome prediction and is demonstrated to have better concordance than the baselines
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Using L2 norm to map different modalities to the same/similar vector is limiting. Histopathology and genomic data does carry complementary information, and having L2 to force a common feature vector might result in either or both encoders to drop information that can not be extracted from the other one. It would be great if authors can comment how this would be handled and what should be the stopping criteria for pre-training.
- One interesting aspect of the work is the group-wise feature embedding, however, the advantages of the group-wise feature embedding haven’t been shown. For instance, if the genomics data is not utilized via groups, how much will be performance suffer?
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

With the source code release, paper has sufficient details for reproducibility. Datasets used (TCGA) are publicly available.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

Overall the paper is well written and is addressing a key issue with multimodal tasks defined over histopathology and genomics data, and can potentially be extended to include imaging and other clinical data. A key concern is with the pre-training criteria being used which runs the risk has loss of information within the encoders, and paper would benefit greatly from a discussion on how to alleviate this concern e.g. with appropriate early stopping criteria.

It also beneficial to briefly mention, if any, limitations of the proposed method. One potential limitation seems to be that for pre-training all multimodal dataset needs to be available. It would be great if authors can comment on it.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Paper is well written and of value to the multimodal research community, especially on data efficient tasks. There are a few concerns regarding the pre-training method of minimizing L2 norm, and potential value of using group wise embedding.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #4

Please describe the contribution of the paper

The authors provided a Pathology-and-genomics Multimodal Transformer for Survival Prediction, with more interesting innovations and complete experiments.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

It’s a novel pathology-and-genomics multimodal framework, including the unsupervised multimodal data fusion approach and the flexible modality fintuning method. The method can achieve better performance even other methods.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Only one table in main text, and need have a clearer presentation. Here, I don’t have too many weaknesses opinions.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

I think it is good, all using the public dataset. Hope the authors can open source both the data link and the program later.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

I have not given much worthwhile revision here due to the discrepancy in research areas. Overall the storytelling is interesting and the methodology is more innovative. My possible opinion is that the table 1 could be further streamlined and the most important points could be presented more clearly.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The figures and tables are complete and the approach is relatively interesting.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
Proposal of a multimodal transformer that integrates pathology and genomics insights for survival outcome prediction in colon-related cancer.
- Clear context for previous work provided including novel contributions
- Well explained methodology for fusion, embedding, and attention
- Comprehensive analysis on publicly available data, reasonable ablation analysis performed
- No statistical analysis provided

Author Feedback

We thank the meta-reviewer and all reviewers for their helpful feedbacks. We address the concerns as follows:

For the concern of potential information missing during pretraining (R1), we didn’t observe this issue based on the results. For example, with the unsupervised multimodal pretraining, we found that we could achieve good survival prediction performance even only using unimodal data for finetuning (e.g., image or genomics data). These results show that our multimodal pretraining can gain auxiliary cross-modality information. We added more experiments by removing the unsupervised pretraining stage and only keeping supervised model training (i.e., the finetuning stage in our method). Without the unsupervised pretraining, the performance is lower about 2%-10% than ours on multi-modality and single-modality data, which confirms the information extraction during the pretraining in our original study design.

For the concern about the advantage of group-wise genomics feature embedding (R1), we aim to include clinical information to survival prediction. Each group of genes has a similar biological functional impact so that we can train its own embedding extractor to achieve the biological feature embedding separately. If the genomics data isn’t categorized in groups, the performance keep similar on TCGA-COAD while is lower about 7% on TCGA-READ than ours.

To clarify the potential dataset limitation (R1, R4), all of our pretraining multimodal datasets are from the public TCGA portal and cBioPortal. A potential limitation is that our pretraining scheme needs multimodal data, while not all public datasets satisfy this requirement.

For the concern of similarity loss selection and genomics data usage (R3), our setting outperformed the setting of using cosine similarity loss and combining all genomics data (about 3%-6%). For combining all genomics data, we found that using them separately is better. For example, the CNA and Methylation data have different value scales and biological information.

We added the potential limitation (R1) and the results mentioned here in the final submission. Also, we corrected all typos (R2, R3) in our manuscript. For our future works, we will consider other potential early stopping criteria (R1), add the interpretability of our method (R2) and organize the table in a clearer way (R4).

back to top

Pathology-and-genomics Multimodal Transformer for Survival Outcome Prediction