Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Anushree Bannadabhavi, Soojin Lee, Wenlong Deng, Rex Ying, Xiaoxiao Li

Abstract

Autism spectrum disorder(ASD) is a lifelong neurodevelopmental condition that affects social communication and behaviour. Investigating functional magnetic resonance imaging (fMRI)-based brain functional connectome can aid in the understanding and diagnosis of ASD, leading to more effective treatments. The brain is modelled as a network of brain Regions of Interest(ROIs) and ROIs form communities and knowledge of these communities is crucial for ASD diagnosis. On one hand, Transformer-based models have proven to be highly effective across several tasks, including fMRI connectome analysis to learn useful representations of ROIs. On the other hand, existing transformer-based models treat all ROIs equally and overlook the impact of community-specific associations when learning node embeddings. To fill this gap, we propose a novel method, Com-BrainTF, a hierarchical local-global transformer architecture that learns intra and inter-community aware node embeddings for ASD prediction task. Furthermore, we avoid over-parameterization by sharing the local transformer parameters for different communities but optimize unique learnable prompt tokens for each community. Our model outperforms state-of-the-art (SOTA) architecture on ABIDE dataset and has high interpretability, evident from the attention module.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43993-3_28

SharedIt: https://rdcu.be/dnwNt

Link to the code repository

https://github.com/ubc-tea/Com-BrainTF

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors present a novel transformer architecture to learn ROI embeddings for fMRI analysis. The model leverages known ROI - Functional network associations to first create community-level ROI embeddings which are then fused into global ROI embeddings. The ROI embeddings are then used to train an ASD vs HC classifier that beats the known SOTA classification accuracy on the ABIDE dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Good presentation, results are backed by experiments, qualitative and quantitative comparisons.
    • Authors don’t just focus on classification accuracy. Some thought is given to interpretability.
    • A novel use of known ROI x functional network assiciations.
    • SOTA classification accuracy. Results are well-supported by qualitative and quantitative comparisons with relevant prior work.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Incremental results. The proposed model is essentially a variation on previously proposed models (Comparisons with these models is provided in the paper).
    • Some sort of test (permutation test?) to establish the statistical significance of improvement in classification accuracy over other models is warranted. A different train-test split of the data might not show the same improvement.
    • Some details about the transformer architecture are missing.
    • A discussion of limitations of the proposed model is missing.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Some important details about the transformer architecture used in the proposed model are missing which will hinder reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Answers / explanations to the following would go a long way in improving the manuscript: 1) What are the ROI - functional network assignments? What is the justification for using these particular assignments? How big or small are the functional networks? 2) From figure 2 it seems like the Limbic network (L) has very few ROIs and plays no role in ASD classification. What if this particular network is removed from consideration completely? 3) Please include more details about the transformer architecture that was used. How many encoder/decoder layers did the local and global transformers have? Was it just the one encoder and one decoder layer in both cases? 4) What type of positional encoding is used? 5) Is the “personalized prompt” token different from positional encoding? How? 6) What are the dimensions of the query, key, value parameter (weight) matrices? How are they initialized? 7) It makes sense that the prompt tokens play an important role at the local community level. But is the global prompt token necessarry and / or important? 8) Specifically, in Table 2 (The input to global transformer part) can you include performance results where only the node features are used as input to the global transformer? 9) What are the limitations of the proposed model? 10) Any insights into instances that were misclassified by the proposed model?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • The authors propose a novel idea and produce strong results (SOTA as far as I’m aware). They support their results with sufficient experiements, qualitative and quantitative comparison with relevant models etc. I particularly liked the fact that interpretability of the proposed model is discussed at least to some extent.
    • However, although well-presented, the work is mostly incremental - it builds on a series of papers that have previously proposed transformer models for fMRI analysis and ASD classification. In addition, some important details about the model architecture are missing which would affect reproducibility, unless well documented source code is provided along with the manuscript.
    • Based on these points, I believe this would be a good poster presentation, once the concerns pointed out in the comments are addressed.
  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    7

  • [Post rebuttal] Please justify your decision

    Authors have addressed most of my concerns. I think this will be a very good poster presentation at MICCAI.



Review #2

  • Please describe the contribution of the paper

    This paper proposes a transformer-based architecture that takes into account functional modules. It does so by first grouping ROIs from the same modules and learning separate embeddings using ‘local’ transformers, then learning inter-module relationship via a ‘global’ transformer encoder. The use of attention provides model interpretability and a qualitative study was also conducted based on the attention weights to shed insight on important ROIs for autism.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The existence of functional modules is well known but not explored in many existing works on disease classification - this papers fills a gap in the literature.
    2. Design of the transformer-based architecture - in particular the weight sharing across local transformer encoders - could mean that this architecture can scale well in future analyses that use finer modules. It could allow for interesting studies in future.
    3. Adequate and thorough analysis of results (quantitative and qualitative + ablation)
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The improvement in model performance over existing baselines such as the Brain Network Transformer seems modest (no statistical test results were shown, but differences are likely not statistically significant especially for AUROC), suggesting that incorporating modular information might not actually help as much as we’d think. In light of this, it would have been helpful to attempt experiments on other datasets before looking into model interpretability.

    2. Extending from point 1, it’d be more interesting to see if this architecture can still work well in cases where fewer data samples are available, given that transformers are often seen to be ‘data hungry’ (although there have been more data efficient forms of transformers, the dataset size they work with are still larger than the typical size of fMRI datasets). This work used the ABIDE dataset - with 1000 data samples, it is relatively and uncommonly large as compared to other datasets. If shown to work on smaller datasets, the impact of this work would have been greater.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    No major issues. Don’t seem to be able to find the average runtime despite it being mentioned in the checklist, would have been useful to have it mentioned somewhere in the paper or readme.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    On top of the above-mentioned points,

    1. In section 3.1, it was mentioned that heads = communities , i.e. 8 heads were used for the transformer encoder. This might seem to be an intuitive fit for the global transformer encoder, but since the local transformer is only exposed to each matrix (X_1, X_2, … X_k) individually, why would it make sense to use 8 heads for this? It would have been better to have an experiment to vary this and perhaps using fewer heads might obviate the constraint of weight sharing across the local transformers.
    2. How was the standard deviation computed - were the experiments repeated over multiple seeds? Don’t seem to be able to find any information about this in the manuscript.

    3. Table 2 - what about the case where only node features were used for the global transformer? Difficult to ascertain the usefulness of the prompt token without this comparison.

    Minor issues

    1. Eqn 2, d_k doesn’t seem to be defined
    2. How exactly was the mapping between ROIs to functional communities done? Does the Craddock atlas provide this?
    3. If a V100 GPU was used, there should be at least 16GB available (instead of the 8GB mentioned in section 3.1)?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I am satisfied with the design of the architecture but the current results are not quite sufficiently convincing yet - it’d help a lot to experiment with another dataset. Choice of experiments could have been better too - showing that the architecture can work well on datasets smaller than ABIDE would have greater impact since not many fMRI datasets are as large as it.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    4

  • [Post rebuttal] Please justify your decision

    There is indeed novelty in this work (beyond incremental, esp how they designed the archi to capture modular info). However, my main concern is that their proposed way to incorporate modular information unfortunately does not clearly show - through the numerical results - that modular information is unambiguously valuable. This raises the question of whether what they proposed has captured modular information in the best way. Perhaps having another dataset would have alleviated this concern. To be clear, this is not a dealbreaker - a paper shouldn’t be accepted just because improvements are statistically significant.

    Question about how they arrived at 8 heads for the local transformer does not seem to be addressed in the rebuttal despite its importance, hence the rating stays the same. As mentioned in the rebuttal, 5 seeds is rather few in terms of robustness and without clarity about how parameters like number of heads was arrived at, it makes the results not as convincing as they could have been.



Review #3

  • Please describe the contribution of the paper

    This paper proposes a novel method called Com-BrainTF for ASD classification task based on a hierarchical localglobal transformer architecture, that utilizes community-specific associations when learning node embeddings in fMRI connectomes. Personalized prompt tokens are learned for each community to differentiate the local transformer embedding functions. The proposed architecture was demonstrated to enhance the accuracy of fMRI brain connectome analysis and have high interpretability.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The proposed Com-BrainTF method efficiently learns and integrates community-aware ROI embeddings in brain connectome analysis by utilizing both ROI-level and community-level information, which is a novel and important aspect of this work.
    2. The use of a hierarchical local-global transformer architecture that improves the accuracy of fMRI brain connectome analysis is a strong aspect of this work.
    3. The authors avoid over-parameterization by sharing the local transformer parameters for different communities but optimizing unique learnable prompt tokens for each community.
    4. The visualization results of the attention module demonstrate the ability of Com-BrainTF to capture functional community patterns that are crucial for ASD vs. HC classification.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. While the authors mention the state-of-the-art model on ABIDE dataset, they do not compare their method with other existing methods in the literature.
    2. As ABIDE is a composite of data from 17 international sites with varying settings, it is recommended to utilize a stratified sampling approach when dividing the dataset into training, validation, and test sets.
    3. The advantages of over-parameterization are yet to be fully illustrated with respect to other metrics such as computational time and memory usage.
    4. The paper lacks a discussion on related work on utilizing community-based information in fMRI connectome analysis.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Good, the source codes are given to ensure the implementation.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. The authors could provide a more detailed explanation of how their method benefits from incorporating community-specific information, and why this is an improvement over existing methods.
    2. Additionally, it would be helpful if the authors could compare the performance of Com-BrainTF with more advanced architectures such as Graph Attention Networks (GATs) and Graph Convolutional Networks (GCNs) that have also been utilized in fMRI connectome analysis.
    3. The authors should provide more discussion on the clinical implications of the proposed method and its potential use in real-world ASD diagnosis and treatment.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, the paper presents a novel contribution to the field of fMRI connectome analysis and ASD diagnosis. The proposed method is effective in learning and integrating community-aware ROI embeddings for ASD prediction tasks. Based on the strengths of the paper and the potential impact on the field, I recommend weak acceptance of the paper.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    I agree with the reviewers that this work is a incremental work. It is unclear what is the advantage of the proposed work comparing to existing methods. The authros should address all the questions and comments of the three reviewers.




Author Feedback

We thank all the reviewers for the positive comments, such as “novel, good presentation, adequate analysis.”

Novelty and Contributions [MR1]: We believe our work makes significant strides in two areas: (i) We’re the first to integrate community-level and ROI-level information into individual-level predictive transformer architecture. As R3 noted, “a novel contribution to the field”, as R2 noted “this papers fills a gap in the literature”, and as R1 noted, it’s a “novel use of known ROI x functional network associations”. Our motivation is backed by previous neuroscience research that underlines functional communities’ importance in neurological disorders (Sec. 1). Our proposal of a local-global transformer architecture and novel parameter sharing strategy significantly differs from existing methods for brain network analysis. (ii) Compared to BrainNetTF (BNT), our ComBrain-TF enhances interpretability (see Fig. 2 and supplementary). As R3 mentioned, our visualization results show Com-BrainTF’s ability to capture communities crucial for ASD prediction, a fact R1 also highlighted. Our work addresses the literature’s shortcomings regarding community information, transformer architecture for ASD prediction, and detailed interpretability results.

Justification for Using Prompt Tokens in Global Transformer [R1, R2]: We maintain the global prompt token as a key design, obtained by concatenating the local transformer’s output prompt tokens and processing them through an MLP (Fig. 1 and detailed in Sec. 2.2). The global transformers’ prompt token serves three purposes: 1) Integrating community information for superior prediction compared to using node features (Sec 3.3) ; 2) Being “capable of learning relationships within and between communities” (Sec 3.2, Fig. 2 (3)); and 3) Identifying important communities for the prediction task, as represented by the attention scores between the global prompt and each community (Sec 3.2.2, the identified differences in DMN and SMN). Given aforementioned importance of the global prompt, we did not omit it in ablation studies. As expected, we further test the setting using node features only and observe 1-5% drop on our used metrics.

Numerical Improvement [R1]: As recognized by We conduct extensive qualitative experiments, which show a substantial improvement in model performance compared to SOTA methods on the averaged performance over 5 runs where showing statistical significance may not be meaningful given limited repeats reported. Our improvement should be received. As an example, Com-BrainTF achieves 72.5(4.4) in accuracy, whereas the best comparison method scores 68.1(3.1). Notably, we achieve much smaller std across different runs compared to BNT and FBNETGNN. As R1 noted, our Com-BrainTF not only enhances accuracy but also interpretability.

Reproducibility [R1]: ROI-Functional Network assignments follows the Yeo 7 network template (Yeo et al. J Neurophysiol. 2011) We will include this detail in our revision. Also, we have shared the link to our source code with detailed configuration in our original submission to ensure reproducibility.

Choice of Dataset [R2]: We chose the ABIDE dataset for three reasons: (i) ABIDE is an open-source dataset ensuring reproducibility; (ii) ABIDE’s diverse data from 17 sites show generalizability; (iii) ABIDE has widespread usage in prior ASD research. Small, privately available datasets present reproducibility challenges. One can pretrain on large data and then perform transfer learning on small data. Furthermore, we propose weight sharing in local transformers to reduce model parameters to tackle ‘data hungry’ .

Details Clarification [R2]: As Sec 2.2 mentions, prompt tokens differ from position embeddings and are learnable 1D vectors concatenated with functional connectivity matrix inputs. We employed one local transformer encoder layer and one global transformer encoder (no decoder). We used a C-Series Virtual GPU for Tesla V100 (8GB).




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    One of my major concerns is that I think the work is incremental. But the authors did not address this concern directly in their rebuttal. One reviewer mentioned there is no comparsion studies since the author claimed their work is stat-of-the-art. The author did not address this important question.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper proposed a transformer architecture which integrates community-level and ROI-level information into individual-level predictive transformer, and applied it to ABIDE dataset for ASD classification. In general, it is an interesting study with clear presentation. The authors also provided detailed rebuttal for the concerns. Although one of the reviewer still concerns about the effectiveness of modular information as well as other details, a majority of reviewers recognized the merit of this paper for MICCAI field. And I also recommend acceptance of this paper.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper proposes a novel method called Com-BrainTF for ASD classification task based on a hierarchical local-global transformer architecture, that utilizes community-specific associations when learning node embeddings in fMRI connectomes. The major strengths include a hierarchical local-global transformer architecture and the visualization results of the attention module. However, I agree with some reviewers that it would have been helpful to attempt more experiments on the architecture and parameters of the model before looking into model interpretability. Additionally, it would be helpful if the authors could compare the performance of Com-BrainTF with more advanced architectures such as Graph Attention Networks (GATs) and Graph Convolutional Networks (GCNs) that have also been utilized in many fMRI connectome analyses. This will make the results more convincing. Although some weaknesses are occurred, most of them are addressed by the author, it is a fair paper with weakness very slightly weigh over merits.



back to top