Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Rudy Rizzo, Martyna Dziadosz, Sreenath P. Kyathanahally, Mauricio Reyes, Roland Kreis

Abstract

Magnetic Resonance Spectroscopy (MRS) and Spectroscopic Imaging (MRSI) are non-invasive techniques to map tissue contents of many metabolites in situ in humans. Quantification is traditionally done via model fitting (MF), and Cramer Rao Lower Bounds (CRLBs) are used as a measure of fitting uncertainties. Signal-to-noise is limited due to clinical time constraints and MF can be very time-consuming in MRSI with thousands of spectra. Deep Learning (DL) has introduced the possibility to speed up quantitation, while reportedly preserving accuracy and precision. However, questions arise about how to access quantification uncertainties in the case of DL. In this work, an optimal-performance DL architecture that uses spectrograms as input and maps absolute concentrations of metabolites referenced to water content as output was taken to investigate this in detail. Distributions of predictions and Monte-Carlo dropout were used to investigate data and model-related uncertainties, exploiting ground truth knowledge in a synthetic setup mimicking realistic brain spectra with metabolic composition that uniformly varies from healthy to pathological cases. Bias and CRLBs from MF are then compared to DL-related uncertainties. It is confirmed that DL is a dataset-biased technique where accuracy and precision of predictions scale with metabolite SNR but hint towards bias and increased uncertainty at the edges of the explored parameter space (i.e. for very high and very low concentrations), even at infinite SNR (noiseless training and testing). Moreover, training with uniform datasets or if augmented with critical cases showed to be insufficient to prevent biases. This is dangerous in a clinical context that requires the algorithm to be unbiased also for concentrations far from the norm, which may well be the focus of the investigation since these correspond to pathology, the target of the diagnostic investigation.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16452-1_68

SharedIt: https://rdcu.be/cVVqp

Link to the code repository

https://github.com/bellarude/MRSdeeplearning

Link to the dataset(s)

https://github.com/bellarude/MRS_detasets


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper does a systematic comparison between CNNs and traditional model fitting for MR spectroscopy quantification, and identifies major concerns about the CNN approach.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well written and raises awareness of a previously-unknown issue for the application of deep learning methods to MR spectroscopy.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The investigation was somewhat limited in scope – however, I do not perceive this as a major limitation, as the authors have clearly identified this limitation in the paper, and have provided a roadmap for further investigations.

    The quantification network seems to be based on generic (non-spectroscopic) references [25,26], but it is not clear how these methods compare to deep learning methods designed specifically for spectroscopy like those in [10,11,12]. It would have been nice to see this same evaluation on additional methods, since not all deep learning methods are expected to behave in the same way.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Code and data is not included, but the paper is reasonably detailed and relies on simulations, so it should be possible to approximately replicate the study.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    This work gives a nice warning about the potential pitfalls of relying completely on deep learning for MR spectroscopy. There may be value in discussing the link between this observation and similar observations that have been made in the context of MR imaging:

    Antun et al, On instabilities of deep learning in image reconstruction and the potential costs of AI. PNAS 2020

    Chan et al, Local Perturbation Responses and Checkerboard Tests: Characterization tools for nonlinear MRI methods. MRM 2021.

    There is very likely a similar underlying principle

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper sheds light on a previously-unrecognized major problem with one of the popular new approaches for MR spectroscopy quantification. This has the potential to be very impactful

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This is a simulation study highlighting bias in the estimation of model parameters and uncertainties when using neural networks with MR spectroscopy data.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    A solid simulation study nicely executed with clear hypotheses, results and conclusions.

    This is timely work extending recent demonstrations in other quantitative imaging scenarios that significant bias can arise when using regression type methods, e.g. CNNs as here, to estimate model parameters from high dimensional data. The approach is appealing, as it can be much faster than traditional model fitting and potentially avoid problems like local minima. However, care is required to select an appropriate training set and even then it appears difficult to avoid bias towards the mean particularly for rare cases.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The application area is quite niche - model-based spectroscopic MRI - but nevertheless of interest to some MICCAI attendees and the problems and conclusions highlighted are more broadly important (they have been shown before in diffusion MRI - ref [9] in the paper - but the confirmation in a different application is valuable).

    The study is simulation only. A demonstration of how the issue might manifest using real data sets would have added a lot to the paper.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Seems fine.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    A few specific comments:

    • Regarding estimation of the different types of uncertainty, one approach the authors don’t consider is to train the network specifically to output an estimate of uncertainty cf. (Tanno et al NeuroImage 2021). The authors might consider that when extending this work.

    • Figure 7 it’s a little hard to decipher. Could be reorganised to make clearer what is experimental result and what is ground truth.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    It’s a solid piece of work, albeit a little niche and simulation only.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The main contribution of this paper is to conduct a synthetic study of the robustness, in terms of bias and variance (due to aleatoric and epistemic causes), of a CNN trained for metabolite quantification for MR spectroscopy. The main conclusion is that the CNN displays significant bias and variance (in both noisy and noiseless data) which depends on the parameter value; predictions for parameter values near the bounds of the generated data display more bias/variance. In contrast, they find that traditional model fitting shows no parameter value dependence with respect to bias/variance of estimates.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength of this work is that it provides a valuable counterpoint to previous studies on deep learning quantification for metabolite quantification, showing the possible pitfalls of deep learning in comparison to traditional model fitting algorithms.

    The analysis of bias and variance were extensive.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main weakness of this paper is the lack of an example on actual MRS data (whether in-vivo or a phantom). I am aware that this is a synthetic study, but just a small real-world example of the DL failing in a pathological case would have greatly strengthened the paper.

    Furthermore, one of the papers cited as an example of deep learning quantification [10] also conducts an uncertainty analysis, and concludes favorably on the deep learning method. I think it is necessary to address this/discuss the potential contradiction.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    I am not sure how difficult it would be to reproduce the synthetic dataset used for training; however, the training of the networks (given the data) seems to have been described sufficiently in detail for reproducibility (although the learning rate is not specified, I assume this means ADAM with the default learning rate).

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    As perhaps a naive question (but one which could be addressed in the paper), what if one simply simulates an extremely large range of concentrations such that one is sure that these concentrations will never be reached. Would this perhaps reduce the variance but increase the bias? It would be interesting to see the model retrained with different data where the parameter bounds are increased.

    As mentioned, the study is restricted to a single CNN architecture which was found optimal after a hyperparameter search. If expanded to a journal, I think it would also be worth it to reproduce the architectures of previously published studies. Furthermore, I think it is even worth it to discuss whether a CNN is the optimal architecture for inference from spectrograms (see e.g. https://towardsdatascience.com/whats-wrong-with-spectrograms-and-cnns-for-audio-processing-311377d7ccd, which applies to sound spectrograms but some of the criticisms are transferable). It is possible that an architecture more tailored to the data may alleviate some of the problems noted in the paper.

    In general, the DPI/resolution of the figures are quite low, and detract somewhat from the paper while reading/zooming in. For the future, I would suggest to remake the figures in a higher resolution.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I found the work to be novel and detailed in the analysis, although limited in scope (single architecture, synthetic dataset). I think it is a good fit for MICCAI and a valuable contribution for addressing potential weaknesses in deep learning approaches compared to traditional methods.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper presents a systematic comparison between CNNs and traditional model fitting for MR spectroscopy quantification and identifies major concerns about the CNN approach.

    An overall interesting contribution that can be valuable for MICCAI audience. Reviewers question mainly the choice of architecture for CNN (CNN for spectogram, only a single CNN method) . I would add some concern about using inference time dropout as the only method for uncertainty estimation as this method raised some concerns in the past. Further, it would be interesting the influence of using classical vs deep methods on real clinical data. Finally, while the paper can be of interest to the MICCAI community and represent a thorough evaluation through simulation study the novelty itself is quite limited. It might be a better fit for an MRI physics journal. Based on reviewers agreement, the paper can be of interest to the MICCAI community, yet can benefit from some fine-tuning.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4




Author Feedback

Comment 1:the study is restricted to a single optimally designed (i.e., via hyper-parameterization) but generic (i.e., non-spectroscopic) CNN architecture [25,26]. The comparison with architectures designed for spectroscopic datasets [10,11,12] or the investigation of methods more tailored to the data are suggested as extension. Response 1:the considered network has been extensively compared with state of the art CNNs (i.e., ResNet, U-Net, InceptionNet, EfficientNet) tuned and trained on spectroscopic dataset and on networks formerly designed for spectroscopic datasets, such in reference [10]. This analysis is reported in a ready-to-submit full paper for Magnetic Resonance in Medicine journal. The network considered in this submission is the best performing network among the one tested in the paper. Given words limit and journal exclusivity, it was not disclosed in the submission.

Comment 2: code and data are not included. Response 2: The camera ready paper will report links to a github repository where datasets and code can be downloaded.

Comment 3: one of the papers cited as an example of deep learning quantification [10] also conducts an uncertainty analysis, and concludes favorably on the deep learning method. It is necessary to address this/discuss the potential contradiction. Response 3: we think a proper comparison difficult to be discussed given:

  • The approach used in [10] for the assessment of measurement uncertainty is heuristic and developed specific to the case where the Signal-to-Background Ratio (SBR) of metabolites is considered and available. Given our method predicts directly concentrations, SBR is not available. The methods we used come from general CNN applications with a theory-oriented and formal approach.
  • the simulated spectra we use mimic human brain tissue composition and acquisition conditions, while ref [10] refers to rat spectra recorded at a different magnetic field with quite distinct properties.
  • In this study a dataset specifically designed to investigated significantly low SNR spectra and pathological spectra is used. This clinical-relevant conditions were not addressed in ref [10] and that would compromise a 1:1 comparison.

Therefore, to compare the two techniques, access to ref [10] dataset is needed, to adapt and train our model with, or we would need the exact architecture and uncertainty estimation method (i.e., to feed our data with) used in ref [10], that at the best knowledge of the authors are not available.

Comment 4: in general, the DPI/resolution of the figures are quite low. Response 4: figures in the camera ready paper will be provided in higher resolution.

Comment 5: the whole study is limited to simulated data. Extension to phantom or in-vivo would be a complementary interesting investigation. Response 5: The aim of the study was to theoretically demonstrate possible intrinsic bias of a top-notch performing DL architecture on quantifying spectra metabolites. That requires the Ground Truth knowledge that only an in-silico setup can give. Outlook to phantom or in-vivo data is foreseen for future projects.



back to top