Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Duilio Deangeli, Emmanuel Iarussi, Juan Pablo Princich, Mariana Bendersky, Ignacio Larrabide, José Ignacio Orlando

Abstract

Although normal homologous brain structures are approximately symmetrical by definition, they also have shape differences due to e.g. natural ageing. On the other hand, neurodegenerative conditions induce their own changes in this asymmetry, making them more pronounced or altering their location. Identifying when these alterations are due to a pathological deterioration is still challenging. Current clinical tools rely either on subjective evaluations, basic volume measurements or disease-specific deep learning models. This paper introduces a novel method to learn normal asymmetry patterns in homologous brain structures based on anomaly detection and representation learning. Our framework uses a Siamese architecture to map 3D segmentations of left and right hemispherical sides of a brain structure to a normal asymmetry embedding space, learned using a support vector data description objective. Being trained using healthy samples only, it can quantify deviations-from-normal-asymmetry patterns in unseen samples by measuring the distance of their embeddings to the center of the learned normal space. We demonstrate in public and in-house sets that our method can accurately characterize normal asymmetries and detect pathological alterations due to Alzheimer’s disease and hippocampal sclerosis, even though no diseased cases were accessed for training. Our source code is available at https://github.com/duiliod/DeepNORHA.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43993-3_8

SharedIt: https://rdcu.be/dnwM9

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #3

  • Please describe the contribution of the paper

    The authors demonstrate that a deep learning-based normative model strategy that takes advantage of hemispheric asymmetry can outperform other state-of-the-art methods on both synthetic and real data.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors address a hot topic (i.e. normative modelling with artificial intelligence) The paper is generally well-structured and easy to read The applications to AD and hippocampal sclerosis are of interest and the results compelling The authors train, test, and validate their proposal using data from multiple cohorts The authors compare their proposal against multiple state-of-the-art methods

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper’s main contributions are either untrue or not sufficiently substantiated There is no proper discussion of the literature. Similar works that the authors currently overlook include 10.1016/j.mri.2019.07.003, 10.1093/brain/awab417, 10.3174/ajnr.A5943, and 10.3390/s21030778 Both experimental design and reporting have substantial of room for improvement (see 10.1101/2021.12.12.21267677)

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Ok.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    • Abstract: the structure of the abstract is rather confusing. For clarify purposes, I propose following: background, problem statement, materials and methods, key results (with quantitative results), and key take-home message. • Contributions: o “Ours is the first model explicitly designed to learn normal asymmetries in homologous brain structures”. Untrue. 10.1016/j.mri.2019.07.003, 10.1093/brain/awab417, 10.3174/ajnr.A5943, and 10.3390/s21030778. o “Although it is only trained with normal data, it can be used to detect diseased samples by measuring their distance to the normal center, unlike existing methods that capture only disease-specific asymmetries”. Unclear. Is it not this what normative modelling is all about (modelling how normal cases look like to find out what is abnormal in pathological ones)? Unclear: it is unclear, at this point of the text, what ‘normal center’ means in this context. o “Being solely on segmentations, our model is resonator-agnostic and could be applied to data from different sources”. Questionable and unsubstantiated. Would it be accurate to say that it is ‘resonator-agnostic’ given that the quality of the segmentations depends largely on resonators and protocols? What was the resolution of each one of the images? If the resolution is quite low in one of the planes (e.g. 1x1x5), is the performance the same? Adding information regarding the acquisition scanners and protocols would help understand whether this claim is true or not. Including experiments were this is carefully looked at is also necessary to substantiate this claim. • The need for the ‘APH’ is unclear to me. Subtraction and no FC layers is the option leading to the best results. If so, why would we use the APH? • One-class Deep SVDD: The authors assume S is spherical throughout the manuscript and explicitly in Eq. 1 and in their deviation scores, but did not provide evidence for checking this assumption. • Experimental setup: The authors could redefine experimental setup to substantiate their claims and improve the quality and impact of their paper. o Flipping: What happens when we input the same data (e.g. left segmentation vs left segmentation but flipped)? How do these values deviate from zero? The authors should consider showing experiments in this regard and including a few extreme cases (outliers). o Tests on synthetic data: are the results on synthetic data great because of their synthetic nature, i.e. does the model identify fakes easily? While the approximation is useful for determining whether the model works or not, the synthesis strategy seems rather suboptimal and easily identifiable - (random) elastic deformations are unlikely in real life. The authors should consider better models for generating synthetic atrophy patterns, such as those in 10.1016/j.media.2022.102576, 10.1109/TMI.2006.873221, and 10.3389/fnins.2017.00132. The authors should consider testing the way around (i.e. training on synthetic and testing on real). That way they could really show whether their model learns asymmetry or not. The approximation on its own would be extremely useful as, if feasible, would show the model can be trained on synthetic data and work on real-life data. o Resonator-agnostic: the authors should train their model with data coming from a single dataset and testing it on others (training only with OASIS and training on the rest). This would help us see whether the work is indeed resonator-agnostic but also whether it generalises to other cohorts. • Results and discussion: o Both results and discussion are rather shallow: what is the impact of this work? What does it mean for the literature? Can we use the proposal right away in other cohorts? Why do we need a sophisticated model for hippocampal sclerosis when volumetric differences achieve 95% AUC scores? What are the limitations of the current proposal? Is the age of the subjects important for performance? What happens when extreme cases regarding age are input into the model (youngest vs eldest)? Does it make sense to include the age as an input of the model? o In my opinion, Figure 3 and Table 1 show conflicting outcomes. In Figure 3, distances for NC and MCI overlap slightly. However, in Table 1, the AUC is as high as 93%. I this wonder how the AUC was computed? What were the cases of reference? Please include this information in the table. o In Section 4.1, the authors report p-values from the Wilcoxon rank-sum test. While this is indeed a test that can be used to test for differences between groups, the authors should consider reporting U/W statistic too. o In Section 4.1, the authors need to report, in addition to the Wilcoxon rank-sum tests statistics, AUC, accuracy, sensitivity, specificity, positive predictive value, and negative predictive values. This should improve comparability with other works in the literature. • Typos and other minor recommendations: o Abstract – Line 1: this paper introduces … (missing s) o Introduction – Paragraph 2: Many details here are difficult to understand, unless the reader has read the whole paper. Consider editing for clarity. o Introduction – Line 3: difficult to characterize -> difficult-to-characterize o Introduction – Line 6: to changes in asymmetry (remove changes in) o Introduction – Line 9: subjective experience and knowledge (this is quite the claim; while the assessment may be indeed subjective, the experience and knowledge may not; it might be better to edit this phrase) o Introduction – Line 10: basic -> standard/traditional (?) o Introduction – Line 12: beyond mere differences in size (‘mere’ is possibly to strong and negative; maybe consider removing it) o Introduction – Line 16-18: the sentence is not entirely clear (consider modifying for clarity) o Introduction – Line 19: In this paper, (missing comma after ‘paper’) o Introduction – Line 20: that merges their differences (missing s in merges) o Introduction – Line 27: To ensure this embedding to capture variations of normal individuals -> To ensure this embedding learns the heterogeneity in normal individuals o Methods 2.2. -> Equation (1) missing ‘i’. It is probably ||F_\theta(x_{(i)}-c)||^2. Please check. o Experimental setup (Materials) – Line 27: 4 validation sets -> Four validation sets. o Experimental setup: ‘were manually aligned’. What does manually aligned mean in this context? Did someone align all 3243 images to the MNI space manually? Was the registration only affine or was it also non-linear? o Experimental setup (Implementation details) – Line 6: organs (additional s) o Experimental setup (Implementation details) – Line 14: In all cases, (missing comma) o References: not all references are in the correct format or with the correct case (mri->MRI; alzheimer’s -> Alzheimer’s). Please check.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Please see comments above. The work is interesting and can be easily improved.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #4

  • Please describe the contribution of the paper

    The author of this manuscript proposed a new method to quantify abnormal asymetry of hippocampus morphology. To do so, authors proposed to used a Siamese architecture and contrastive learning. There method enables to project hippocampus morphology to an asymetry space where deviation from normal can be determined.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The proposed method is novel and used architecture seems appropriate for the task.
    • Despite claim of contribution (iv), experiments are appropriate to test each contribution.
    • Method obtained good classification score for HSR and HSL despite being trained only on synthetic and HC data.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Claim of contribution (iv) should be removed because not tested.
    • No reference for AD classifier is provided.
    • Classification performance of AD classifier is way lower than state-of-the-art method that are above 90% of AUC.
    • It’s not clear how much FC with 512 outputs is better than no FC. Did the authors performed statistical test? It might be that the difference operation makes the last FC layer useless.
    • No demographics of the data used is provided
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • The authors stated that the method will be freely available only after acceptance of the manuscript.
    • Most data used in this study are freely available online which would enables to reproduce most of the experiments.
    • No demographics of the data used is provided
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • Claim of contribution (iv) should be removed because not tested. Despite that I agree with the idea, the method should less impacted by the type of scanner, there is no experiments that prove it. Moreover, the method being trained solely on segmentation, it could be not robust to different segmentation methods.
    • Figure 3; NC (OASIS) seems to be as far from the hypersphere center (C) than HSR cohort, but also more spread out, but also highly merged with AD (OASIS). and MCI (ADNI). Authors stated that “MCI subjects from ADNI are scattered similarly to NC samples from OASIS, which is consistent with their distances”, this does not sounds to me, MCI from ADNI are far from NC from ADNI. It seems more indicating of a bias due to the source of dataset.
    • It is not clear what method has been used for AD classifier and how this method has been trained. Classification performance seems off, state-of-the-art methods obtained above 90% of AUC.
    • No demographics of the data used in this study is provided, can age, and/or sex affect the results?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method is novel and results outperforms current asymmetry methods.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper addresses the task of normative modeling of asymmetric brain structures for the detection of abnormalities. Both reviewers agree on that the paper is methodologically and structurally appealing, even though there is still room for improvement regarding the depth of the discussion of results and related work. Overall, I recommend acceptance of the paper while advising the authors to address issues regarding the claimed contributions and methodological details in the camera-ready version as mentioned by reviewers.




Author Feedback

N/A



back to top