Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Injun Choi, Guorong Wu, Won Hwa Kim

Abstract

Brain connectomes are heavily studied to characterize early symptoms of various neurodegenerative diseases such as Alzheimer’s Disease (AD). As the connectomes over different brain regions are naturally represented as a graph, variants of Graph Neural Networks (GNNs) have been developed to identify topological patterns for disease early diagnosis. However, existing GNNs heavily rely on the fixed local structure given by an initial graph as they aggregate information from a direct neighborhood of each node. Such an approach overlooks useful information from further nodes, and multiple layers for node aggregations have to be stacked across the entire graph which leads to an over-smoothing issue. In this regard, we propose a flexible model that learns adaptive scales of neighborhood for individual nodes of a graph to incorporate broader information from appropriate range. Leveraging an adaptive diffusion kernel, the proposed model identifies desirable scales for each node for feature aggregation, which leads to better prediction of diagnostic labels of brain networks. Empirical results show that our method outperforms well-structured baselines on Alzheimer’s Disease Neuroimaging Initiative (ADNI) study for classifying various stages towards AD based on the brain connectome and relevant node-wise features from neuroimages.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16431-6_36

SharedIt: https://rdcu.be/cVD6S

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    A novel dynamic aggregation mechanism for graph convolutional networks, exploiting Heat Kernel equation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The comparison with already existing solution is very meticulous and the idea is original, based on a very good literature review. The story of heat kernels and GNN is really reported in a compelling manner.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Writing can be improved

    • The proposed method is interesting though it seems just an incremental improvement on the GrapHeat (Xu et al. 2019)

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The dataset is a well known public dataset, the data selection and use is clear. Crucial details on the preprocessing are missing. Please clarify the PET normalization step.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The major concern is the novelty. Despite the contribution is elaborated and interesting, it seems a small improvement of the graph-heat method.

    Please be specific which Amyloid tracer (there are several in the ADNI dataset), also which ADNI group? All?

    The results using the cortical thickness are a bit puzzling. From Fig.3 it seems you only used AD patients at the baseline (correct?) , yet the results in Table 2 are relatively good for the models even if not good when using other features. This is surprising as at baseline in the ADNI dataset, AD/Control/MCI are not so different except for the cingulum, hippocampus and some areas. Old works on voxel-based morphometry and the ADNI dataset had a lot of difficulties in this. It is quite impressive that works so well even with SVM. So, is the cortical thickness discriminant or not?

    The normalization with the cerebellum for amyloid-PET is known, though depending how it is done it makes a lot of difference. Subtracting the mean value? the max value? Z-score between healthy subjects and each AD subjects?

    Moreover, it is surprising that degree as features performs so badly with SVM. There are other studies even using t-test and simple features as degree, centrality, etc which says the opposite (e.g. Elsheikh et al. Front. Hum. Neurosci. 2021). Can you please check how you use the degree? As an average degree? Kept as local features?

    Minor: The style of the paper is sometimes clumsy and presents many typos, majorly due to hurry or distraction, or long sentences (e.g. page 2 “…Graph Diffusion Convolution (GDC) [14], however,…” . Furthermore, the figure 2 has an imprecision in the left image (scale s1 is not as wide as s2, despite the caption tells the opposite). Some symbols are used for different things (I.e index I used both for the convolution layer in eq. 8 and the sample Y in the eq. 7). The performance listed in table 2 is not particularly sharp (in some case the SVM is even better, as in “All Imaging Features” section, or the “GDC” generally is very close), etc

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The novelty of the idea is acceptable but not outstanding, the technical realization of the work is scrupulous but the written elaborate is not made with the same attention. The final results are tepid (they might be some tuning or not really significantly different from GraphHeat). It might be a more interesting case for generative problems (e.g. Graph-GANs rather than just classification).

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    4

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #2

  • Please describe the contribution of the paper

    In this paper, the authors proposes a flexible GCN model that learns adaptive scales of neighborhood for individual nodes of a graph to incorporate broader information from appropriate range. Extensive experiments on various datatsets show that the proposed method outperforms the state-of-the-arts methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) The paper focuses on how to aggregate the information for the brain networks adaptively. In this paper, the authors proposes a flexible GCN model that learns adaptive scales of neighborhood for individual nodes of a graph to incorporate broader information from appropriate range. The paper is well written and the problem is clearly motivated. (2) The authors also derive a gradient-based learning on local receptive field of nodes using a diffusion kernel. Extensive experiments on various datatsets show that the proposed method outperforms the state-of-the-arts methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) In the paper, the authors emphasize the proposed model can achieve the aggregate the information adaptively as a contribution. How does it reflect the adaptively of the proposed method in the paper? What does the s in Eq. (6) mean? It seems that in the paper the only contribution is to make the scalars become trainable. My concern is that this operation can be whether achieve the Node-wise Adaptive Scales. (2) I do not understand the reason why you give the derivative of the loss? As far as I am concerned, the loss in the paper can achieve by back propagation. The Eq.10 to Eq.15 are the formulation for the back propagation. Do exist some constraints that make it difficult to perform the backpropagation, such as the discrete domain optimization? I do not find the constraints. Compared with the [1], the only difference for the paper is to make the scale s in Eq6. become learnable. [1]. Xu, B., Shen, H., Cao, Q., Cen, K., Cheng, X.: Graph convolutional networks using heat kernel for semi-supervised learning. In: International Joint Conference on Artificial Intelligence. Macao, China (2019)

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    According to the reproducibility checklist and information from the paper, the results should be reproducible after the potential acceptance.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Please refer to the strengths and weaknesses of the paper.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method part. I will raise the score if the author can address my concerns.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The authors have nicely addressed my major concerns and I thereby recommend the paper acceptance. The paper derived an analytic derivative of cross-entropy w.r.t. the scale which is a major contribution of the work.



Review #3

  • Please describe the contribution of the paper

    This paper proposes a flexible model that learns adaptive scales of the neighborhood for individual nodes of a graph to incorporate broader information from the appropriate range. The authors derive a parameterized formulation to perform gradient-based learning on the local receptive field of nodes using a diffusion kernel.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper proposes the adaptive range of individual nodes. This idea is consistent with the property of the brain network that each ROI has different biological topological properties, thus different local receptive fields should be provided to understand the subnetwork structure.
    2. The training on the scale is well derived and the paper is well organized.
    3. Some interpretations such as key ROIs are provided with visualization.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. No need to provide detailed formulas of graph convolution or GNNs in the preliminaries.
    2. This paper considers only one dataset. More extensive datasets that involve different modalities should be added into consideration.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    This paper does not provide code. The only experimented dataset ADNI is restrictively available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Some as the weakness.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The novelty of the adaptive scales for GNN aggregation. Also, this idea is generic and not restricted to brain networks.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Somewhat Confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This work presents a dynamic aggregation mechanism in graph neural nets for brain network analysis. The problem is well-motivated. Several concerns were raised by the reviewers, including the writing quality, novelty, the clarity of presenting the method, and justification of the improvement in the experiment. The authors may want to read the reviewers’ comments point by point and address the comments in rebuttal.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7




Author Feedback

We appreciate the reviewers for constructive and positive comments. We will answer all concerns in the reviews and address the novelty of our work one more time. The text will be polished and important discussions will be included in the revised text if the paper is accepted, and the code will be made public soon for reproducibility. R1/R2-Q1) Incremental from GraphHeat / Adaptivity in the method?: While the scale in GraphHeat is a hyperparameter that a user has to choose, our model “learns” the optimal scale in a data-driven manner with backpropagation. As the kernel is a separate component from the MLP, it was challenging to optimize it with gradient descent. We derived an analytic derivative of cross-entropy w.r.t. the scale which is a major contribution of our work. The derivative in Eq 15 informs the direction in the scale space to search for a critical point where the loss (i.e., train error) is minimized. This is a partial derivative at each node i, so it yields adaptive node-wise optimal scales as seen in Fig 3 in the paper. To achieve this with GraphHeat, a user must manually tune each scale for each node which may take a lifetime, and GCN cannot deal with this issue by its nature as each layer affects all the nodes at once. We believe this is a significant improvement as R2 and R3 agree, and we would appreciate it if R1 acknowledges our novelty. R1-Q2) Details on preprocessing?: The subjects (including all ADNI groups) with multiple visits were selected as the dataset was originally curated for longitudinal structural connectome study. PIB Aβ45 was used as an Amyloid tracer. Measures were taken by regional averages and normalized by the concentration measure at the cerebellum to compute SUVR in a clinic streamline. Processing was kept consistent across all subjects for fair comparisons of subjects and ML models. R1-Q3) Discrepancy between Fig 3 and Table 2?: We are sorry for not being more clear. We are comparing all 5 stages of AD at once, and the quantitative and qualitative results are shown in Tab 2 and Fig 3. Fig 3 shows “trained scales” demonstrating which ROIs “behave independently” for classifying all the 5 classes. Perhaps they are the most discriminant but we are not arguing it and a more detailed study is required. As seen in Tab 2, cortical thickness is definitely discriminative as Acc with SVM achieves 72% (and 87% with our model), but pathology biomarkers are better overall. R1-Q4) Degree not effective?: Degree (the sum of edge weights connected to each node) is effective at the prodromal AD stage (structural damage emerges in MRI). A random guess for 5 classes is 20% and the majority class in our data has 33%. Using only the degree already achieves 55% Acc with high Specificity, and together with the brain network yields 67% (GDC) and 70% (Ours). It is just less sensitive than PET biomarkers as structural alteration is often not distinct in the early stages of AD. R2-Q5) Why take derivatives of the loss?: Heat-kernel in Eq 6 models a diffusion process w.r.t time (i.e., scale s), which defines the local neighborhood of a location (via distance between node p and q). It is a common practice to compute gradients of loss w.r.t model parameters via chain rule (Rumelhart et al, Nature 1986), which is essential for gradient descent. Eq 15 is a direct derivative where we only require the scale to be positive; we believe there is a better way to do this optimization and our work will serve as a baseline. R3-Q6) Too detailed preliminary / Reproducibility: We agree with the reviewer on these weaknesses. Preliminary was verbose to keep our paper self-contained, but we will cut it short and discuss the contents in this rebuttal. For generalizability, we do have results for commonly used graph data for GCNs that achieve SOTA results, but due to the nature of applications for MICCAI, we only provide limited results. We will start working on richer details and results in a journal if this work is accepted by MICCAI.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Thanks for authors’ response. The concerns on comparison with GraphHeat, missing details, and results improvement are well-addressed.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The strength is a novel dynamic aggregation mechanism is proposed for brain network analysis. Most concerns are responded in the rebuttal. Acceptance is recommended.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    3



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This work proposes a GCN model to adaptively learn multi-scale discriminative features for classification tasks, here applied to the AD classification problem from connecctome data. After the rebuttal, the reviewers recognize the interest of the work and appreciate the methodological contribution with respect to the state-of-the-art, as well as the detailed formulation of the optimization problem by deriving the close-form gradient of the loss. The work thus represents a valid contribution to the conference.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4



back to top