Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews Back to top

List of Papers By topics Author List

Paper Info

Reviews

Meta-review

Author Feedback

Post-Rebuttal Meta-reviews

Authors

Thomas Z. Li, John M. Still, Kaiwen Xu, Ho Hin Lee, Leon Y. Cai, Aravind R. Krishnan, Riqiang Gao, Mirza S. Khan, Sanja Antic, Michael Kammer, Kim L. Sandler, Fabien Maldonado, Bennett A. Landman, Thomas A. Lasko

Abstract

The accuracy of predictive models for solitary pulmonary nodule (SPN) diagnosis can be greatly increased by incorporating repeat imaging and medical context, such as electronic health records (EHRs). However, clinically routine modalities such as imaging and diagnostic codes can be asynchronous and irregularly sampled over different time scales which are obstacles to longitudinal multimodal learning. In this work, we propose a transformer-based multimodal strategy to integrate repeat imaging with longitudinal clinical signatures from routinely collected EHRs for SPN classification. We perform unsupervised disentanglement of latent clinical signatures and leverage time-distance scaled self-attention to jointly learn from clinical signatures expressions and chest computed tomography (CT) scans. Our classifier is pretrained on 2,668 scans from a public dataset and 1,149 subjects with longitudinal chest CTs, billing codes, medications, and laboratory tests from EHRs of our home institution. Evaluation on 227 subjects with challenging SPNs revealed a significant AUC improvement over a longitudinal multimodal baseline (0.824 vs 0.752 AUC), as well as improvements over a single cross-section multimodal scenario (0.809 AUC) and a longitudinal imaging-only scenario (0.741 AUC). This work demonstrates significant advantages with a novel approach for co-learning longitudinal imaging and non-imaging phenotypes with transformers. Code available at https://github.com/MASILab/lmsignatures.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43895-0_61

SharedIt: https://rdcu.be/dnwzt

Link to the code repository

https://github.com/MASILab/lmsignatures

Link to the dataset(s)

https://cdas.cancer.gov/nlst/

Reviews

Review #4

Please describe the contribution of the paper

This work proposes a method to integrate multimodal data, i.e., medical health reports and imaging data, for pulmonary nodule classification. They focus on two key challenges, i.e., the irregular collected data, and multimodal information. For the previous problem, they used smooth interpolation and ICA to recover and expose the necessary information consisted in the longitudinal reports. For the latter one, they adopted the Transformer for modality fusion. Experiments on multiple datasets validated the superiority of the proposed method over the baselines.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

(1) The research topic is interesting and well-motivated. (2) The methodology is intuitive and reasonable. (3) Visual interpretation is provided for a better understanding of the proposed method.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

(1) Some network designs lack reasoning. (2) The network implements details, such as the number of convolutional layers are not provided. (3) The analysis and evaluation of the proposed method are not sufficient. Please refer to the comments for more details.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The reproducibility of this work is relatively high if they will release the source code upon acceptance as they claimed.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

(1) They employed smooth interpolation to tackle the missing records. It would be better to evaluate the influence of different interpolation methods on the final performance. (2) What if to use deep learning methods to recover the missing records? For example, try to pretrain the network via cloze like BERT. (3) Please specify how the hyperparameters selected in this method, for instance, the length of L (9195) and the dimension of e (m=630037) in the second paragraph of section 2. (4) It would be more interesting if they can provide the analysis on the attention map of the Transformer layer. (5) How is the effectiveness of the Time distance mask in equation (3)? It should be evaluated in the experiments. (6) For further investigation, they can try to cross-reconstruct the multi-modality data. It may be more effective than using a fixed padding embedding in the Transformer.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Limited novelty and insufficient experiments.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

The authors propose a novel transformer-based approach for jointly learning longitudinal imaging and non-imaging phenotypes, which combines repeated imaging with longitudinal clinical features collected from electronic health records (EHRs) for predicting the diagnosis of solitary pulmonary nodules (SPNs). The proposed method performs unsupervised separation of potential clinical features and jointly learns from clinical feature expressions and chest computed tomography (CT) scans using time-distance scaled self-attention, effectively addressing the asynchronous and irregular sampling issues faced by conventional clinical patterns across different time scales.Compared with other methods, evaluation of 227 challenging SPN subjects demonstrates significant performance improvements.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Novelty: A new transformer-based strategy is proposed for classifying SPNs by jointly learning from longitudinal medical imaging, demographics, billing codes, medications, and laboratory values. It models interpretable latent clinical features via independent component analysis (ICA), adds a fixed padding embedding to the longitudinal multimodal transformer to represent missing items in the sequence, and designs a TEM-scaled self-attention for cross-time scale joint learning. Interpretability: Latent clinical signatures are modeled via probabilistic independence. Scalability: The proposed method masks the self-attention on padding embeddings, allowing it to be scaled across different subjects with different sequence lengths. Flexibility: The proposed method demonstrates training strategies for small datasets, incomplete data regimes, and noisy label pairs of multimodal data within a flexible transformer architecture. It utilizes unsupervised learning on non-imaging datasets, pre-training on publicly available datasets without EHRs, and pre-training on noisy labeled paired multimodal data to overcome small cohort sizes.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Lack of necessary discussion: Discussion on the choice of T is missing. Experiment supplement: More experimental details and parameter settings need to be supplemented.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Reproducible
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

A detailed description of the experimental setup and parameter settings will led readers to better reproduce your research.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Novelty
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #1

Please describe the contribution of the paper

This paper presents a transformer-based approach that integrates multiple modalities (electronic health record and imaging) and longitudinal information (three time points) for lung nodule malignancy prediction. Transfer learning and unsupervised learning were utilized to address the issue of the small dataset and missingness in multimodal learning. Time-distance self-attention module forces the model to place greater attention on recent observations.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Integrating various modalities (CT, demographic information, ICD code, medication, and lab) provides a comprehensive understanding of patients’ medical histories.
- The use of the longitudinal multimodal transformer model is a novel contribution.
- The result of the model shows significant improvement over the models trained on either fewer modalities or single timepoint.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- While the authors compared cross-sectional and temporal models, they did not compare the model to other existing methods. Although other models may not have used a multimodal approach, it is still important to provide readers with a basis for evaluating the proposed unimodal (imaging) model against existing models (Citation 1, 5, 10, 19).
- The datasets were not well described. How were the final sample sizes determined? What were the inclusion/exclusion criteria? It is impossible to assess for possible selection bias without this information.
- The authors transformed each variable into a longitudinal curve at “daily resolution”. I am concerned at the level of imputation that must have occurred to estimate daily values, given that many of these patients are not observed daily or monthly (except those who are severely ill).
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

This paper utilizes both public (NLST) and private datasets, and the authors have made their code available. It is unclear whether the private data is readily accessible to others. There is no explicit mention of code sharing. Model parameters were also not explicitly discussed, but this likely due to limited space.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
- The authors should provide details on the preprocessing procedures applied to the CT scans.
- It would be informative if the authors reported precision/recall in addition to AUC.
- More information is needed regarding the nodule region proposal, particularly with regard to whether patch proposals were done separately for each of the three timepoints, and whether nodules at different timepoints were paired up.
- Please clarify what billing codes were used and whether the codes for malignancy indeed correspond with a diagnosis of lung cancer (e.g., pathology-proven).
- The authors emphasize the time-distance self-attention module as one of the innovations, but the contribution of this module is unclear without an explicit ablation study.
- The authors should move the dataset description to the beginning of the methods section, as some of the datasets are mentioned when introducing the models.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The proposed approach is novel, and the authors present promising results related to the contribution of temporal information.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
The authors propose a transformer-based approach for jointly learning longitudinal imaging and non-imaging phenotypes.

Key strengths:
1. The use of the longitudinal multimodal transformer model is a novel contribution.
2. Good experiment design and results
3. Visual interpretation is provided for a better understanding of the proposed method.
Key weaknesses:
1. The datasets were not well described.
2. the parameter settings need to be discussed

Author Feedback

We thank the reviewers for their thoughtful critiques regarding our submission. Specifically, we are grateful to the reviewers for highlighting a need to (1) analyze the effectiveness of time-distance self-attention, (2) expand on the imputation strategy, (3) specify and rationalize hyperparameter settings, and (4) elaborate on dataset characteristics.

Analyze the effectiveness of time-distance self-attention. Reviewers 1 and 4 expressed concern that the contribution of time-distance self-attention was unclear due to absence of an ablation experiment. To address these highly valid concerns, we cite the work of Li et al. [1], which demonstrated significant advantages of time-distance self-attention over vanilla self-attention with positional encodings in longitudinal vision transformers. Our work extends this technique by applying it to data with longer time scales, incorporating padding masks, and testing it in a multimodal context.

Expand on the imputation strategy. Reviewers questioned the feasibility of imputing at daily resolution. While we agree this is difficult when patients are observed sparsely, our approach only uses imputed data at cross-sections where imaging occurs. Clinical decisions are likely to be made surrounding imaging, leading to densely populated local observations at these cross-sections. For this reason, we hypothesize that imputation via longitudinal curves is reasonably accurate surrounding an imaging event. This link between imaging and non-imaging data is also why we did not reconstruct imaging when it was missing. Reviewers also questioned the comparative advantage of smooth interpolation versus other methods. For this project, we chose an imputation approach that was well-validated and has been shown to be effective in unsupervised discovery of latent clinical signatures [2–4]. In a future extension of this work, we hope to explore other approaches, such as those suggested by reviewers, in learning useful multimodal representations from medical data.

Specify and rationalize technical details. Reviewers expressed concern around lack of technical detail such as the design of the convolutional embedding, choice of T, L (9195), and dimension of e (m=630037). In response, we will revise the Methods section to add in the requested details and incorporate our rationale behind each hyperparameter setting. We also incorporated additional information on network architecture that will aid reproducibility.

Elaborate on dataset characteristics. Reviewer 1 expressed concerns around which billing codes were used to curate in-house datasets and if these rules correspond to the gold-standard biopsy diagnosis. In response to these insightful concerns, we provided results from an unpublished study that validates billing codes for identifying subjects who have a pulmonary nodule and no cancer of any type prior to the nodule. In short, we conducted a chart review of a random subset of all subjects from our institution who meet these inclusion criteria and found our billing code rules to have a sensitivity of 0.930 (95% CI: [0.879, 0.969]), specificity of 0.996 (95% CI: [0.989, 1.00]), and precision of 0.979 (95% CI: [0.959, 1.000]). We will include a description of the billing code-based rules and validation in Supplementary 1.2.

[1] T.Z. Li, K. Xu, R. Gao, Y. Tang, T.A. Lasko, F. Maldonado, K.L. Sandler, B.A. Landman, Time-distance vision transformers in lung cancer diagnosis from longitudinal computed tomography, Https://Doi.Org/10.1117/12.2653911. 12464 (2023) 229–238. https://doi.org/10.1117/12.2653911. [2] T.A. Lasko, Efficient Inference of Gaussian-Process-Modulated Renewal Processes with Application to Medical Event Data, Uncertain Artif Intell. 2014 (2014) 469. /pmc/articles/PMC4278374/ (accessed November 29, 2021). [3] T.A. Lasko, Nonstationary Gaussian Process Regression for Evaluating Clinical Laboratory Test Sampling Strategies, Proc Conf AAAI Artif Intell. 2015 (2015) 1777. /pmc/articles/P

back to top

Longitudinal Multimodal Transformer Integrating Imaging and Latent Clinical Signatures From Routine EHRs for Pulmonary Nodule Classification