Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Ana Lawry Aguila, James Chapman, Andre Altmann

Abstract

One of the challenges of studying common neurological disorders is disease heterogeneity including differences in causes, neuroimaging characteristics, comorbidities, or genetic variation. Normative modelling has become a popular method for studying such cohorts where the ‘normal’ behaviour of a physiological system is modelled and can be used at subject level to detect deviations relating to disease pathology. For many heterogeneous diseases, we expect to observe abnormalities across a range of neuroimaging and biological variables. However, thus far, normative models have largely been developed for studying a single imaging modality. We aim to develop a multi-modal normative modelling framework where abnormality is aggregated across variables of multiple modalities and is better able to detect deviations than a uni-modal baselines. We propose two multi-modal VAE normative models to detect subject level deviations across T1 and DTI data. Our proposed models were better able to detect diseased individuals, capture disease severity, and correlate with patient cognition than baseline approaches. We also propose a multivariate latent deviation metric, measuring deviations from the joint latent space, which outperformed feature-based metrics.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43907-0_41

SharedIt: https://rdcu.be/dnwdn

Link to the code repository

https://github.com/alawryaguila/multimodal-normative-models

Link to the dataset(s)

https://adni.loni.usc.edu/

https://www.ukbiobank.ac.uk/


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose to extend normative models for multi-modal neuroimaging data using a variational approach. Authors also propose a deviation score in the normative modelling context for multi-modal data. They validate their model on large-scale datasets (UKB, ADNI) using significance ratio based on their new deviation metric, showing improvement over previous VAE-based models.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) Readability: We wish to thank the author for the overall readability of their work. The writing and the explanations are globally very clear and concise. 2) Related works: Related works about multi-modal VAEs have been extensively described and addressed in a fully understandable way and the model contributions with respect to these works are well motivated. 3) Novelty: Novelties with respect to model training and deviation scores computation appears to be valuable as shown by the experimental section. Concerning the deviation score computation, the Mahalanobis deviation score is computed with respect to the posterior distribution parameters. It would have been interesting to compare with the prior distribution (isotropic Gaussian with a diagonal covariance matrix). 4) Future works: conclusion discusses the potential use of conditional VAEs and confound variable integration to derive conditional normative modelling. I think that these perspectives are appealing and their mention is on point.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) Normative modelling vs Anomaly Detection: I think that it is unusual to address Anomaly detection under the name of « Normative Modelling ». Perhaps, we would have expected the use of a confounding variable such as age, sex or site to call it Normative Modelling. As there is no confounding variable to condition with, this work seems more similar to Anomaly Detection. In that case, the deviation score computation the use of traditional measures could have been discussed such as [1]. 2) Experimental Results: In experimental results, a measure of AUC would have been more relevant as a reader would have been more likely to understand to which extent this kind of method enables to classify Disease vs Healthy thanks to an accuracy-like metric. [1] Outlier Exposure: DEEP ANOMALY DETECTION WITH OUTLIER EXPOSURE, Dan Hendrycks et al., ICLR 2019

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The code is given and the datasets can be accessed so it should be easy to reproduce this work.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The paper is overall clear and concise. The motivations are well described and justified. The novelty is well grounded and comparisons with concurrent methods and uni-modal cases have been extensively performed which is valuable.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    See comments.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    Authors have clarified why their method can be considered as “normative modelling”. Nonetheless, when I compare their pre-processing step - consisting of data harmonization to remove confounding variables - with prior work on normative modelling [6], there is a gap which raises a concern. In [6], the model is conditioned on confounding variables and there is no data harmonization. This gap is somewhat independent of the proposed method and it should be mentioned very clearly as it could explain part of the results obtained by the authors. With that said, I still find the main contributions interesting for this conference and I vote for acceptance.



Review #2

  • Please describe the contribution of the paper

    The paper proposes two multi-modal VAE normative models to detect subject level deviations across multimodal neuroimaging (T1 and DTI) data. The authors also propose a multivariate latent deviation metric, measuring deviations from the joint latent space.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper addresses an important problem of normative modeling and multimodal neuroimaging data. The paper is overall well-written, organised and easy to follow.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The paper is not novel enough in terms of methodological contributions. The authors have used VAE with Generalised Product-of-Experts (gPOE) and Mixture-of-Experts (MOE) are well established methods of combining multiple modalities in VAE.

    2 The author’s argument of feature-based deviation metrics not suitable for multimodal normative modeling is not particularly convincing. While it’s true that data reconstructions capture only information relevant to a particular modality, the feature space is more reliable and interpretable than the latent space. The output of latent space if used for downstream analyses can be unreliable if the VAE is not trained properly. Also feature-space analyses allows you to associate the deviations with brain regions which is important for analysing AD heterogeneity.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Satisfactory

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Please see my major comments in the weakness section.Some of my other comments are as follows:

    1. The authors identified outliers quantifying how much the latent space deviations differ from the normal distribution. It is not clear how the p-values are estimated from Dml if those are normalised Z-scores. It would be useful if Dml of disease subjects can be normalised w.r.t to Dml of held-out validation set healthy controls (similar to Equation 3)

    2. Since the broader goal of the paper is normative modeling, it would be helpful to see how Dmf performs with respect to disease staging and correlation with cognition. While significance ratio is a reasonable metric for outlier detection, it is not suitable for assessing the quality of the deviations. So, just by seeing better significance ratios, it’s not reasonable to claim that Dml is better for normative modeling than Dmf or the baselines.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper has novelty concerns in terms of technical contributions. Also, the author’s argument of feature-based deviation metrics not suitable for multimodal normative modeling is not particularly convincing.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    I am satisfied with the author responses and I am changing my rating from weak reject to weak accept.



Review #3

  • Please describe the contribution of the paper

    The authors proposed two multi-modal normative modelling methods, MoE-normVAE and gPoE-normVAE, to capture joint information from multimodal neuroimaging datasets. They also proposed a latent deviation metric to measure deviations from the joint latent space. They demonstrated that their models and metric outperformed baseline methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The presentation is very great. The authors clearly explained how they built on previous work and discussed the novelty and limitations of their proposed methods with respect to existing literature. The model objective functions and the evaluation metrics were well defined. Figures are illustrative and straight-forward.

    2. The experiment was conducted on large-scale datasets. The result comparison was comprehensive with statistical tests included.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. In Introduction, I found it a bit hard to follow when I read the PoE and MoE paragraph for the first time. I had questions like: What does it mean that the PoE joint distribution may be biased towards overconfident but miscalibrated experts? How does MoE solve problems in PoE exactly? It will be helpful to explain the intuition, especially for readers who are not very familiar with these two methods. Also it would be better to make it clear that you proposed two methods, MoE-normVAE and gPoE-normVAE. I thought only gPoE was proposed until I read the last paragraph in the Introduction.

    2. For Table 1, please kindly explain the potential reasons why the significant ratio changes across latent dimensions? Why does gPoE-normVAE work well on dimension 5 and 10 and why does MoE-normVAE work well on dimension 15 and 20?

    3. For Fig 2, why not show MoE-normVAE? Results from all models show similar significant levels, is gPoE-normVAE significantly better than other models?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Code is available with clear instructions.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. Could you please clarify in Fig 1 caption “encoder layers=[20, 40], decoder layers=[20, 40]” means two layers with 20 and 40 neurons, respectively?

    2. In Experiments Data preprocessing first paragraph, please specify the number of patients for each disease group among those 122 patients.

    3. I am curious, could you infer which modalities are more sensitive to disease prediction with proposed methods?

    4. Please discuss limitations of current work.

    5. Please make text in figures larger. It is almost invisible.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Technically solid paper with clear presentation

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    I appreciate the authors clarified my questions. It will make the manuscript more clear after including the clarification of how gPoE and MoE address PoE issues and discussion of limitations.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper is clear and well-written. The presented work is of interest for the neuroimaging community, has novelty and appropriate experimental validation setting. What requires additional discussion is 1) the comparison between normative modeling and anomaly detection, especially w.r.t. lack of taking into account confounders, such as age and sex 2) network details; 3) PoE challenges due to miscalibrated experts and how utilized frameworks address these; and 4) limitations of current work.




Author Feedback

We thank the reviewers and meta reviewer for their insightful feedback and reviewing our work. Below we provide our response: We thank R2 for their comments on latent vs feature-based metrics. We agree that feature based metrics can provide valuable insight, allowing for per-region analysis, and we suggest they be used in conjunction with latent based metrics. However, we find that latent deviation metrics are more sensitive to deviations when using a single aggregate measure of abnormality. As requested, we replicated Fig 2 using Dmf. Dmf was not as sensitive to disease staging and correlation with cognition as Dml and we saw no improvement of multimodal over unimodal models. We also note that R2s comment on the latent representation being “unreliable if the VAE is not trained properly” extends also to the data reconstruction used in feature based deviation metrics. MR and R1 ask whether this work is normative modelling or anomaly detection. Normative models are a type of anomaly detection that measure how much individuals deviate from a healthy population distribution whilst taking into account confounding variables. Our approach differs from some deep normative models, e.g., cVAE, that condition on confounds. Instead, for comparison with PoE-normVAE, we first remove the effects of confounds placing all individuals in a common latent space independent of confounding effects, where we then measure deviations from the healthy norm using a multivariate distance metric. We therefore believe our work to be normative modelling. We will clarify this in the submission. We thank R1 for reference [1], which we will include in the submission, where a negative log likelihood (NLL) feature-space metric is used for OOD detection. We note that Dml is closely related to NLL applied in the latent space. In terms of feature-space metrics, we expect the NLL metric in [1] to have similar performance to Dmf. MR and R3 ask for clarification on the bias of PoE towards overconfident experts and how gPoE and MoE address this issue. For PoE, the joint encoding distribution is proportional to an equally weighted product of the experts’ probability densities and will have high density in regions where experts agree. However, if we have an overconfident mis calibrated expert, i.e. a sharp, shifted probability distribution, the joint distribution will have low density in the region observed by the other experts and a biased mean prediction. This can result in a suboptimal latent space and data reconstruction. For MoE, the joint distribution is given by a mixture of the experts’ densities so that the density is spread over all regions covered by the experts and overconfident experts do not monopolize the resulting prediction. However, MoE is less sensitive to consensus across modalities and will give lower probability to regions where experts are in agreement than PoE. gPoE addresses the overconfident experts problem by including a trainable weighting parameter allowing the model to down-weight experts which cause erroneous predictions. Depending on the application, either MoE or gPoE will be most appropriate and so we consider both methods for normative modelling. We will clarify this in the submission. In response to MR and R3, we address the limitations of this work. Firstly, current normVAE models use ROI level data. Data processing software, such as FreeSurfer, may fail to accurately capture abnormality in images, particularly if large lesions are present. Further work involves creating normative models designed for voxel level data to better capture disease effects. Another limitation is the need to adjust for confounds and batch effects prior to analysis. Further work involves including these effects in the model, similar to cVAE normative models. In response to MR and R3, we confirm that each network consists of two layers with 20 and 40 nodes. Whilst we were not able to address all comments, we hope that we were able to answer the major concerns.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors have addressed the most important concerns in their rebuttal and there is consensus that the paper merits publication at MICCAI.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This is a very interesting paper that leverage multi-modal data for outlier detection, which is solidly evaluated on large-scale datasets. It tackles an important research question, how do we incorporate all sources of information (including medical imaging) to generate insights for clinical resarch. The paper is well written. It is a strong paper for MICCAI.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Reviewers had some concerns and suggestions for improvement regarding the paper. However, overall, the comments are positive, indicating that the paper is clear, well-written, and of interest to the neuroimaging community. The reviewers acknowledge the novelty and appropriate experimental validation of the presented work. The rebuttal has adequately addressed reviewers’ concerns. With the necessary revisions incorporated, the paper is well-positioned for publication at MICCAI.



back to top