Authors

Eunsong Kang, Dawoon Heo, Heung-Il Suk

Abstract

In recent studies, deep learning has shown great potential to explore topological properties of functional connectivity (FC), e.g., graph neural networks (GNN), for brain disease diagnosis, e.g, Autism spectrum disorder (ASD). However, many of the existing methods integrate the information locally, e.g., among neighboring nodes in a graph, which hinders from learning complex patterns of FC globally. In addition, their analysis for discovering imaging biomarkers is confined to providing the most discriminating regions without consideration of individual variations over the average FC patterns of groups, i.e., patients and normal controls. To address these issues, we propose a unified framework that globally captures properties of inter-network connectivity for classification and provides individual-specific group characteristics for interpretation via prototype learning. In our experiments over the ABIDE dataset, we validated the effectiveness of the proposed framework by comparing to competing topological deep learning methods in the literature. Furthermore, we analyzed individually specified functional mechanisms of ASD for neurological interpretation.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16437-8_32

SharedIt: https://rdcu.be/cVRti

Link to the code repository

https://github.com/ku-milab/PL-FC

Link to the dataset(s)

http://preprocessed-connectomes-project.org/abide/download.html

Reviews

Review #2

Please describe the contribution of the paper

The paper combines prototype learning and topological relational learning to learn high-order inter-network functional connectivity (FC). Empirical results are reported on the ABIDE dataset.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The pape is well-written. The design of the model, problem formulation and experimentation are in good standing. They demonstrate better results than GNN methods with predefined ROIs and other convolutional methods with and without prototypes.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Lack of ablation study: Are all of the suggested components (inter-network encoder, prototype-base classifier, transformer, etc.) contributing to the high performance of the model? It is not clear how much each component is contributing, and whether, e.g., replacing transformer with a non-attention module would degrade the performance.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The code is released, no reproducibility concerns. All good.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
We advise the authors to assign an editor to revise the paper, examples of such improvements would be:
- Abstract, “by comparing to competing” -> “by comparing with competing”
- …
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This work properly reuses the well-known neural components to solve a real clnical problem. Eventhough there might not be novelty in each component of the proposed method, the mixture of them and how the problem is formulated is deemed novel. The results are promising compared with similar methods. The method is well-explained. As a bonus point, it is worth thanking the authors for their prospective code release.
Number of papers in your stack

4
What is the ranking of this paper in your review stack?

2
Reviewer confidence

Somewhat Confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper

The authors propose a transformer based deep learning framework for topological relational learning to model higher-order characteristics of inter-network functional connectivity. They combine this with prototype learning to uncover differences between patients and controls, while simultaneously modeling individual characteristics. They experiment on ABIDE and examine diagnosis of ASD/controls and ability to generate neuroscientific explanations for intra-class variations and inter-class variations.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The paper is well organized with clear writing and explanation, and is easy to follow.
2. Adopting prototype learning to generate neuroscientific explanation is an interesting premise and is novel for this application.
3. The experimental results demonstrate improvements over state-of-the-art baselines for classification
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The authors use 5 fold cross validation, but do not explicitly mention a validation set for hyper-parameter tuning. It is unclear how key hyper parameters for the methods and baselines have been selected and whether this was done in an unbiased way. I think it is an important point for clarification, since there seem to be at least 4 hyper parameters (\lambda_1, \lambda_2 \lambda_3 for losses and m for the prototype margin) which seem to dictate generalization.
2. I found some of the claims of the paper a bit strong and not as well explained:
a. Table 1, the authors claim that prototype learning provides improvements in performance when comparing BrainNetCNN and BrainNetCNN+P (with prototype learning). The AUCs/Accuracy for both are very close (in fact BrainNetCNN+P has larger error bars). There seems to be some tradeoff between sensitivity and specificity. I am not sure whether the ‘improvements’ would be statistically significant.

b. “To be specific, if a summary feature vector of a TD subject is replaced with the ASD prototype (pc=pASD), we can predict the functional degradation of FC as if the subject were suffering from ASD.”

“It should be noted that our proposed method can generate counterfactual FC patterns for a subject, which can be of great benefit and used to obtain deeper insights into the functional characteristics of a brain in regard to ASD.”

It is not immediately clear why replacing the prototype of an typical individual with that of the ASD class would necessarily correspond to generating a valid counterfactual. Could the authors please provide further explanation/references on why this is a reasonable assumption to make?
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors have provided an anonymized link to the code base. In my opinion, this is a point in favor of the reproducibility.

However, the range of hyper-parameters considered, method to select the best hyper-parameter configuration in the paper is still ambiguous
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
Additional Points and Clarifications:
1. In section 5-Personalised FC Analysis, the authors base their analysis on randomly selecting two ASD and two controls subjects for identifying the top 5 ROIs with the largest variation. However, since this sampling was done only once, it is unclear how stable these variations (and thus the selection of the top ROIs) are.
2. How is the positional encoding matrix e in Eq. 1 calculated?
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

4
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I found the idea of utilizing prototype learning for generating neuroscientific explanations interesting and possibly novel for functional connectivity analysis. However, I had several concerns about the experiment design (hyper parameter settings, soundness of clinical interpretation), due to which I would recommend a weak reject.
Number of papers in your stack

7
What is the ranking of this paper in your review stack?

2
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #4

Please describe the contribution of the paper

This paper proposes a method for analyzing functional connectivity (FC) data for interpretable classification. The proposed neural network uses multi-head attention to learn global inter-network relationships for FC reconstruction. The pretrained model is then finetuned in the classification task, where prototypes of each class in the embedded space are learned. The prototypes can then be used along with an individual’s FC representation to explain inter- and intra-class differences for a subject. The methods were tested on classification of autism spectrum disorder (ASD) vs control subjects using the ABIDE dataset.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The paper proposes an interesting method for learning global inter-network relationships - considering each ROI a seed of a network, the sequence of FCs values for each ROI are analyzed using a transformer encoder architecture.
2. The classification approach uses prototype learning, thus embedding some interpretability directly into the classification model learning.
3. I greatly appreciate that the parameter settings for all the models are shared (in the supplementary), enhancing reproducibility and assessment of experimental results.
4. The experiments use the public ABIDE dataset, also enhancing reproducibility.
5. The general flow of the paper is well-organized.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. While I like the idea of embedding interpretability directly into the classification model, I have concerns about learning just 1 prototype for each of the normal and ASD classes. ASD is known to be extremely heterogeneous as it describes a spectrum of disorders - thus, I feel that having only 1 prototype to represent the whole class may not be the right model. There is also a large range of “normal”. Can the authors comment on/justify this modeling choice?
2. There are missing details/explanations in the methods/experiments that need to be added/clarified in order to improve the understanding of the paper and results. Detailed comments are given below in question 8.
3. There is also some missing analysis I feel in the experimental results section that should be included to strengthen the arguments that the proposed method outperforms other approaches. Detailed comments are given in question 8.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Reproducibility is fairly good - based on the checklist/submission it appears code will be shared with acceptance. However, there is some missing experimental analysis (e.g. parameter sensitivity, statistical significance, when does method fail) that could be added to improve the paper.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
Detailed comments are in order of appearance in the paper.
1. The paper makes use of the self-attention mechanism and multi-headed attention modules. However, the original paper proposes such structure is never cited and should be included: Vaswani et al., All you need is attention, 2017.
2. I am not sure what the variable z_r^L represents. I think that z_0^L is the summary vector representing the entire FC networks. But where do z_r come from? This needs to be defined/clarified. Perhaps labeling where these variables appear in Fig. 1 would also be helpful.
3. For the 5-fold cross-validation setup, are the partitions done subject-wise? If so, it is confusing because this setup is mentioned immediately after discussing data augmentation, so it is not clear if the split is performed just on the augmented data as a whole, or whether it is by subject. And if not by subject, this would greatly inflate classification performance, since the augmented data per subject is highly correlated.
4. Related to the cross-validation setup, is the pre-training done using the same partitions as the classification training (i.e., same test set is left out the whole time)? While the pre-training does not make use of labels, the same network is being used to learn the classification in step 2, and thus the test data needs to be left out the whole time.
5. As mentioned in the paper, the training is performed in 2 steps - pretraining of the transformer reconstruction network, then learning of the classification model. How does the performance change when trained in 1 step, end-to-end? There is no additional data added in the pretraining as far as I can tell, so I am wondering about the advantage of pre-training vs. end-to-end training, since the classifier is trained in step 2 in an end-to-end model. Comparison to 1-stage training would strengthen the case for 2-stage.
6. The hyperparameter settings, e.g., for the lambdas in the loss function for classification learning are given in the supplementary. However, I wonder how these parameters were chosen? Was any tuning involved, and if so, was a validation set used, or is based on the testing set? Clarification on if any tuning was performed (for the proposed method and other methods) and if so how this was done would be appreciated.
7. For the hyperparameters such as in the loss function, how sensitive are the results to the choice of parameter settings?
8. In Table 1, the AUC means are around 0.6-0.7, but the standard deviations (or standard error? please clarify) are reported in the range of 2-4 - are the decimals off and this should really be 0.02-0.04? The overall means are also reported as proportions, but the standard deviations appear to be in percent. Please check values.
9. For the main classification performance results, while some sense of variation is given, there is no statistical significance testing reported - this could help strengthen the case for the proposed method.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper proposes some interesting ideas for analysis of FC data, using multi-headed attention to learn relationships across a sequence of networks seeded by an ROI, and using this representation for prototype-based classification. However, the choice of 1 prototype per class is somewhat questionable, and many details need to be clarified to be able to fully assess the work.
Number of papers in your stack

6
What is the ranking of this paper in your review stack?

2
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The study proposes a novel prototype learning method for ASD diagnosis and analysis. Authors may add ablation studies and details for hyperparameter tuning to further improve the paper.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

2

Author Feedback

We would like to thank the reviewers for their helpful comments and their positive assessment of our research.

[R meta, #2, #4] Lack of ablation study We will report an ablation study of how each component (intra-network encoder, inter-network encoder, and prototype-based classifier) contributes to the performance, learning strategy (pre-training and end-to-end learning), and hyperparameters of loss functions in our future extension paper. Experimentally, the intra-network contributes less to performance, while the prototype-based classifier performs better than an MLP-based classifier. Our key component, self-attention in the inter-network encoder, should be inserted to calculate the summary vector, which is challenging to exclude for the ablation study. Generally, the Transformer is pre-trained with a big dataset, and then the classifier head is newly trained on the target dataset for the stability of training. We tried end-to-end learning; however, the classification performance is degraded. It seemed more troublesome to conduct both reconstruction and classification only using a small medical dataset.

[R #3, #4] Cross validation and hyper-parameter setting The hyperparameter settings are given in the supplementary. The hyper-parameters of our method and the competing methods are chosen based on the validation set. The validation set is sampled from the training set at the same number as the test set. These sets are split in a subject-wise manner. In other words, all dynamic FCs calculated from one subject are assigned to one of the partitions. Pre-training reconstruction (step 1) and classification (step2) stages used the same training, validation, and test partitions.

[R meta, #3, #4] Statistical significance As commented by the reviewers, we conducted a Wilcoxon signed-rank test (p<0.05) with accuracy to demonstrate the statistical significance. Our proposed method is statistically significant compared with all competing methods (p=0.03125). We will add the results of the statistical test in our revised version.

[R #3] Prototype-based analysis We assumed that z_r includes individual features of each seed ‘r’-based network and z_0 (summary vector) has individual class-specific features. The summary vector is a trainable parameter that summarizes the information in classification. The prototypes are trained to be representative vectors of the class by calculating the similarity of summary vectors. Therefore, replacing the individual summary vector with the prototype vector of the opposite class of individuals can generate counterfactual functional connectivity by a decoder. However, it still needs to do a group analysis of individual counterfactual functional connectivity, which is the remaining analysis that should be addressed in future work.

[R #4] The number of prototypes for a class We agree with Reviewer 4’s comment on the number of prototypes. As ASD is known to be highly heterogeneous, there might be subgroups in ASD. Our prototypes are trained by increasing the similarity between the summary vector and prototypes of its class. Therefore, an additional linear layer is needed for classification when the number of prototypes is over the number of classes. With post analysis of prototypes in the same class, we could infer the functional connectivity patterns of subgroups.

[R #3, #4] Variables and citation Positional encoding matrix e is a sinusoidal encoding matrix, which is widely used in Transformer models. We will add a citation of the original self-attention paper in the revised version. Also, we will edit the overview figure to label the variables.

back to top

Prototype Learning of Inter-network Connectivity for ASD Diagnosis and Personalized Analysis