Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Chiara Mauri, Stefano Cerri, Oula Puonti, Mark Mühlau, Koen Van Leemput

Abstract

Recent years have seen a growing interest in methods for predicting a variable of interest, such as a subject’s age, from individual brain scans. Although the field has focused strongly on nonlinear discriminative methods using deep learning, here we explore whether linear generative techniques can be used as practical alternatives that are easier to tune, train and interpret. The models we propose consist of (1) a causal forward model expressing the effect of variables of interest on brain morphology, and (2) a latent variable noise model, based on factor analysis, that is quick to learn and invert. In experiments estimating individuals’ age and gender from the UK Biobank dataset, we demonstrate competitive prediction performance even when the number of training subjects is in the thousands - the typical scenario in many potential applications. The method is easy to use as it has only a single hyperparameter, and directly estimates interpretable spatial maps of the underlying structural changes that are driving the predictions.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16452-1_43

SharedIt: https://rdcu.be/cVVpY

Link to the code repository

N/A

Link to the dataset(s)

https://www.ukbiobank.ac.uk


Reviews

Review #2

  • Please describe the contribution of the paper

    The model presents a Lightweight Generative Model. It is not a deep learning model, and consequently has as its main advantage its linearity and invertibility. The model is illustrated in prediction of age and gender from brain scan images.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strengths of the paper are its originality (new derivation of an algorithm). The fact that it is not a deep learning based strategy is quite refreshing (and even brave). Another advantage is the existence of only hyperparameter that needs to be tuned and/or experimentally set.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    As for weaknesses I would highlight the difficulty in reproducing the method. The authors do not mention the code to be made available and the mathematical derivation might be quite heavy for unfamiliar audiences.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    I belileve the method to be difficult to reproduce for unfamiliar audiences, not comfortable with the needed mathematics (since as far as I am aware, no code will be made available).

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    It is not clear to me why age is used as covariate for gender classification while no other variables are employed when doing age prediction. I would also like to have seen mentioned the training times of the SFCN and RVoxM methods. And what is the age range on the UK Biobank.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I have chosen a strong accept, since I believe this to be a good paper, only with minor weakness. It is a well written document, the method is novel and it is illustrated in a useful and important application.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The main contribution is a demonstration that an extremely simple Gaussian model for image-based exogenous variable prediction may perform as well or better than far-more-complex peers that have significant disadvantages (larger number of free parameters, lower interpretability), especially when the training set size is relatively small.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength is conceptual. The broad trend is toward ever-more-complex computational machines for this type of task, without considering the disadvantages. The conceptual trap, for computer scientists, is that cheap and simple alternatives like the proposed one are not “novel,” but novel doesn’t necessarily mean better in any and all ways.

    The attempt to compare the simple method against linear/nonlinear generative/discriminative alternatives is also a strength, with some limitations. This appears to be a good-faith effort to replicate prior work.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The weakness in the evaluation is that the leading method, SFCN, was not reimplemented and tested using the current method’s computational pipeline; this paper just re-prints previously published performance numbers. Differences in the random partitioning of training and test sets, as well as differences in preprocessing pipelines, could have accounted for the reported performance differences.

    Another weakness is that claims in the paper are not well supported. Pointing to publications showing deep learners applied successfully to huge image sets, the assertion is made that deep learners must have such large training sets to perform well. Poor performance of deep learners on small training sets has not necessarily been shown in these papers.

    In the evaluation, we see the performance of the re-implemented methods, but not whether that performance is similar to that of prior publications with those methods. In other words, it’s not clear whether the author’s reimplementations of these methods achieved the highest performance possible.

    The argument that the proposed method is more readily interpretable than deep learners is not very convincing either. It is possible to apply various schemes to “trace” the influence of each voxel through the machinery of the neural network, and thus get a quantitative sense of whether higher/lower intensity at that voxel tends to be associated with higher/lower age.

    The argument that discriminative learners have more free parameters than generative ones is suspect. Surely it is possible to assemble complex generative regressors and simple discriminative ones; people just choose not to. The real point of this paper is not about generative vs discriminative, in my view– it is about complexity vs simplicity.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors go to pains to tell us how they implemented and evaluated competing methods, in quite some detail. In that sense reproducibility is high. The data set is open source.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    See above for my comments.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The main reason to accept is that it is a refreshing change of pace from the ever-more-complicated deep learners that dominate the landscape. Carefully and thoughtfully evaluating a simple-minded alternative and its advantages is a unique contribution to the field.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #4

  • Please describe the contribution of the paper

    The paper proposes an alternative to non-linear discriminative methods for predicting variables of interest such as a subject’s brain age from images. The alternative is formulated as a generative model that models the subject’s image as a linear function of the subject’s covariates (such as age, gender, etc.). This is then used as a likelihood model and a posterior of the variable of interest is constructed for discriminative analysis. To learn the generative model, maximum likelihood method is used and closed form solutions are employed. For tractability of the covariance of the noise matrix (which is underconstrained), the noise variable is modeled using factor analysis with a smaller latent noise vector. To estimate these new variables, Expectation-Maximization is used. Experiments are performed on the UK Biobank dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The method proposes a more tractable and lightweight generative model which is linear, and admits closed-form solutions for the weight matrix (common with most linear models). The method performs very well for smaller datasets, but this could also be attributed to model capacity (which is not directly explored in the paper). The paper addresses the problem with deep learning methods early on in the paper, regarding issues with smaller training datasets, difficulty in model interpretation, and extensive tuning required to make the method work, which also can lead to brittle performance.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The model proposes a causal forward model expressing the effects of variables of interest on brain morphology. However, the model is a linear model. Although a linear model is easy to interpret, the work doesn’t motivate why a linear model is sufficient to capture all the nonlinear variation in structure and morphology given only variables like age and gender. The extensive research around deep learning (for both generative and discriminative models) has been to find the right “features” to generate/extract to faithfully capture the conditional probability distribution (of the brain image given other variables, in this case). Moreover, deep models also lend themselves to explainability and interpretability ([1, 2, 3]) and it is incorrect to say that discriminative models are harder to interpret. Moreover, the interpretability of the proposed “causal model” is not substantiated with experiments that show how the proposed model is different/better than other baselines in explaining different factors. For example, Figure 1 is really unnecessary and its not clear to me what exactly the figure is trying to convey. A comparison between baselines is required in the experiments to support the claim that the proposed method is a causal model that lends itself to better interpretation compared to other methods like VAE, SFCN or RVoxM). The paper has very little technical novelty. Linear generative models have been proposed before in the literature and the derivation of the posterior follows from the basic generalized linear models formulation. The major drawback of the linear model is also its lack of flexibility. Since the model only uses upto two covariates (age and gender) the model doesn’t capture, for example, multi-modal behavior or non-linear deviations in the given generative model formulation. Comparison is also slightly unfair with SFCN. For the same input (affine T1), SFCN has a consistently better than the proposed method for all training subject sizes. The tradeoff between accuracy and some other factor (training time, hyperparameter tuning) is not shown to justify the use of the proposed method rather than SFCN.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Most implementation details are provided in the paper. Methods are easy to re-implement, although no code is provided in the supplementary details.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The paper is generally very well written and easy to follow. I didn’t see any major change in flow either. However, the paper initially focuses a lot on an “explainable” generative model, and terms like “causal forward model” that are not used in the paper whatsoever. Deep learning models also follow a “causal structure”, the only difference being the model/hypothesis space, which has implications on learning capacity and optimization. However, I think the Bayesian method proposed is overly simple. Given only a upto two raw covariates (age, and gender), the generative model is simply a linear combination of the two variables with additive noise, which doesn’t seem to have sufficient model capacity. The statement “… naive Bayesian classifiers can empirically outperform more powerful methods when the training size is limited …” which is also the premise for the proposed method is a commonly known fact - naive bayesian classifiers have low model capacity, and are less likely to overfit and have better performance than high-capacity models (deep networks). However, then the paper goes on to say “…even when the number of training subjects is the thousands, our lightweight linear generative method yields prediction performance that is competitive with state-of-the-art nonlinear…”. So it is not clear which case the proposed method is handling - the low or high data regime. Experiments however, do not support the claim as SFCN outperforms the proposed method at every training set size.

    In page1, the phrase “In a recent survey on single-subject …” the survey itself is not cited. Please cite it. In the subsection “Discriminative methods are hard to interpret”, explainability of attention models is questioned but the proposed method doesn’t solve or handle the issue itself.

    [1] LIME: Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. “Why should i trust you?: Explaining the predictions of any classifier.” Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016. [2] Shapley sampling values: Strumbelj, Erik, and Igor Kononenko. “Explaining prediction models and individual predictions with feature contributions.” Knowledge and information systems 41.3 (2014): 647-665. [3] Selvaraju, Ramprasaath R., et al. “Grad-cam: Visual explanations from deep networks via gradient-based localization.” Proceedings of the IEEE international conference on computer vision. 2017.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper proposes a simple framework for estimation of age and gender from brain images using a linear generative model as the likelihood function to derive a posterior for the covariate given the image. However, most methods (including those based on deep learning) are usually derived using a similar formulation (a likelihood term transforms into a “loss function”) and the prior term turns into a “regularization term”. Most Bayesian methods also use a similar framework and then optimize parameters. The proposed method is not very motivating or technically novel. Although both linear and non-linear benchmarks are used, and the paper proposes better predictive performance for low training samples, the performance of SFCN is consistently better than the proposed method. Moreover, the claims about a causal forward model are not substantiated with experiments showing if the proposed method provides any advantage compared to other baselines in interpretability and explainability.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
    • Non deep learning method for explanation
    • You need to re-write the derivation section of the paper and make it more accessible to the miccai audience. The paper should be accessible if the goal is to make it useful
    • The authors MUST release the code for the reproducibility purpose
    • “why age is used as covariate for gender classification while no other variables are employed when doing age prediction” – this need to be addressed and justified
    • What is the point of Fig 1 and what is the take-home message of that? If unnecessary, please remove it
    • Consider removing the derivation of GLM from your paper, it is textbook material
    • What is your comment about the comment regarding “lack of flexibility” for “multimodality and non-linear effect” ?
    • Authors MUST address these comments:
      • “The tradeoff between accuracy and some other factor (training time, hyperparameter tuning) is not shown to justify the use of the proposed method rather than SFCN.”
      • “whether the author’s reimplementations of these methods achieved the highest performance possible”
  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5




Author Feedback

Main points:

  • Both R3 and R4 seem to have misunderstood the distinction between generative vs. discriminative weight maps (going as far as suggesting to remove figure 1 from the paper, which we believe is a key figure). Our claim is not that discriminative weight maps cannot be computed (we already cited several references in the introduction), but rather that they are very tricky to interpret and can be highly misleading. A subsequent journal paper will try to be even more explicit about this key distinction/contribution.

  • R4 seems to think that our method can only model linear effects, which is not the case. As already mentioned in the discussion, adding nonlinearities in the causal forward model is straightforward – additional age prediction experiments in a different dataset with a much larger age span that the UK Biobank show that adding e.g., a quadratic age effect term can indeed be beneficial (experiments not included in the paper). We plan to report on this in a subsequent journal publication.

Other things:

  • As requested, we will add the training times of SFCN, RVoxM and the VAE method in the final manuscript.
  • We did not train SFCN ourselves as the training code was not available to us (even after communicating with the authors privately). However, preprocessing is identical to ours and the test set is quite large, so the comparisons should be fair.
  • The inclusion of age in the gender prediction experiment was a result of time constraints before the MICCAI deadline – these were the results we had readily available at submission time. We have since re-ran the experiment without including age, and the results are very similar – likely because the method can simply include unknown age as an extra latent variable in the model.



back to top