Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Mengshen He, Xiangyu Hou, Zhenwei Wang, Zili Kang, Xin Zhang, Ning Qiang, Bao Ge

Abstract

It has been of great interest in the neuroimaging community to discover brain functional networks (FBNs) based on task functional magnetic resonance imag-ing (tfMRI). A variety of methods have been used to model tfMRI sequences so far, such as recurrent neural network (RNN) and Autoencoder. However, these models are not designed to incorporate the characteristics of tfMRI sequences, and the same signal values at different time points in a fMRI time series may rep-resent different states and meanings. Inspired by cloze learning methods and the human ability to judge polysemous words based on context, we proposed a self-supervised a Multi-head Attention-based Masked Sequence Model (MAMSM), as BERT model uses (Masked Language Modeling) MLM and multi-head atten-tion to learn the different meanings of the same word in different sentences. MAMSM masks and encodes tfMRI time series, uses multi-head attention to cal-culate different meanings corresponding to the same signal value in fMRI se-quence, and obtains context information through MSM pre-training. Furthermore this work redefined a new loss function to extract FBNs according to the task de-sign information of tfMRI data. The model has been applied to the Human Con-nectome Project (HCP) task fMRI dataset and achieves state-of-the-art perfor-mance in brain temporal dynamics, the Pearson correlation coefficient between learning features and task design curves was more than 0.95, and the model can extract more meaningful network besides the known task related brain networks.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16431-6_28

SharedIt: https://rdcu.be/cVD5a

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposed a new deep learning model, Multi-head Attention-based Masked Sequence Model (MAMSM), to extract functional brain networks in task fMRI data. They adopted many state-of-the-art deep learning methods in the natural language processing (NLP) field such as Transformer based on attention mechanism and Masked Language Modeling (MLM) used in BERT. Their approach considering fMRI time-series as one of the sequence data like NLP is reasonable and the application of current deep learning models is interesting. Compared to other methods to extract functional brain networks such as sparse dictionary learning and independent component analysis, MAMSM utilized a state-of-the-art deep learning method showing advanced results in extracting functional brain networks and some resting-state functional networks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Novel application o One main strength of this paper is the novel application of using multi-head attention and masked method to extract features in fMRI time-series.
    • Intuitive design o Also, the application of cosine similarity loss is also notable because it is simply and effectively making their model retain original task design curve during task fMRI scans.
    • Systematic evaluation o They evaluated the results of MAMSM using more than three different models to extract functional brain networks and reported a comparison of extracted temporal features and spatial features, respectively. Especially, comparison to spatiotemporal attention autoencoder (STAAE) enhanced the results of this paper, because STAAE is the current proposed deep learning-based model for extraction of functional brain networks.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Limited discussion of designed models o Because the proposed method was based on a deep learning model, the hyperparameter of the model is a crucial part when training the model. So, it would be more helpful to understand the effect of each component in MAMSM such as the contribution of multi-head attention, percentage of masking, number of hidden layers in the encoder, and decoder of feature selection layer. o Also, it would be good to explain the training procedure and parameters in detail. It was confused about how the input dimension was changed during training, and what is the exact number of features in the feature selection layer.
    • Limited evaluation o Since there was no gold standard method to extract functional brain networks from task fMRI, the author compared the results with GLM. But, as they mentioned in the introduction part, GLM also has some limitations when detecting functional brain networks. o It also would be helpful to describe simply how functional brain networks were extracted from GLM. o Additionally, using only 22 subjects in HCP data could be viewed as one weakness of this paper. HCP provided about 800 subjects’ task fMRI data, however, the authors used a limited number of samples. So, using a larger number of samples would make this paper more robust and impressive.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    It seemed that reproducing their results is hard because as the authors filed out, they didn’t provide the code, the software framework and version, and a detailed description of their model parameter and overall code. Also, even though they said “yes” to the question of “The details of train/validation/test splits.”, it seemed they didn’t mention the detail of the split of the data. Also, I cannot find the information on the statistical significance of the reported Pearson’s correlation coefficient.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    • Major comments o For future works, it would be more interesting to extend the current 2D-based approach to 3D by utilizing the state-of-the-art 3D-CNN model. As the authors described in the conclusion, applying spatial attention would make the MAMSM more complete. In the same vein, extension to 3D can be helpful. o It would be helpful to describe the procedure of other methods for obtaining FBNs such as GLM, and SDL. o It can affect to use of neural network-based techniques to obtain final functional brain networks (FBNs’ W) rather than simple LASSO regression, as other papers used. As we already know, a simple neural networks-based regression model also can be added to the LASSO regularization. o I am wondering if the “token embedding” in the paper is only for applying masked methods? The token embedding was commonly used for the embedding of input word to vector with fixed length (e.g., 512) in NLP, but in MAMSM, it seemed it just meant the input signal with discrete and continuous masking. o Continuing from the above comment, it would be good to clearly explain the meaning of “latent features” from MAMSM. It seemed that it indicates the output of multi-head attention, but other readers can be confused if it is the trained attention scores. • Minor comments o In the first sentence of the Abstract, I think it is better to change “brain functional networks (FBNs)” to “functional brain networks (FBNs). o In Fig. 1., it seemed to be needed to describe more detailed components of the figure such as what is “CLS”, “SEP” in (c), indicating “n” is a number of voxels (or vertex) in (a). o In the “3.3. Reconstruction Loss” of the results section, the further explanation of MAMSM can be placed in the method section, instead. o In the conclusion section, it would be possible to remove the word “model” in “MAMSM model” because MAMSM have already “model” in the last alphabet “M”.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Because of an interesting application of state-of-the-art deep learning methods such as Transformer, systematic evaluation of the results.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #2

  • Please describe the contribution of the paper

    This paper proposed a Multi-head Attention-based Masked Sequence Model (MAMSM) similar to BERT model, aiming to learn the different states/tasks or meanings of the same signal values at different time points in a fMRI time series. Quantitative evaluation demonstrates that the learned feature has better interpretability. The authors also design a novel loss function that combine MSE loss and cosine similarity error to extract FBNs. The experiment results demonstrate the new loss function is more suitable for tfMRI than MSE only and can be used to extract more meaningful brain networks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    a) This paper extracts the functional brain network of tfMRI based on a pre-trained model for learning latent representation, which is novel. b) The new proposed loss function fully considered the potential feature distribution and the relationship with the task design curves. c) The idea is easy to follow and implement.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    a) Ablation studies of proposed loss function are needed. In Table 2, the MAMSM has a higher accuracy probably because of the Loss_cos which makes the first six features of the encoder output close to task designs. For SDL, the method in this paper should be included for comparison : Zhao, S., Han, J., Lv, J., Jiang, X., Hu, X., Zhao, Y., … & Liu, T. (2015). Supervised dictionary learning for inferring concurrent brain networks. IEEE transactions on medical imaging, 34(10), 2036-2045.

    b) The writing qualilty should be improved with necessary details. For example, why mask approximately 10% tokens? Why add continuous masks besides random discrete masks? What is the ratio between random mask and continuous mask?

    c) The recent HCP S1200 release have more than 1000 subjects. Even the HCP Q1 release have 68 subjects. Why only 22 subjects are included in this study. How the training/validation/testing datasets are split among these subjects?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    I think this work is easy to reproduce.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    a) Include ablation studies as mentioned in Q5. b) Include more subjects in HCP S1200 release. c) More evaluations can be done on other tasks like LANGUAGE and WM. d) The authors did not mention the value of K used in Eq. 5. Consider the new loss function is one of the main contributions, it would be better to add more ablation studies with different value of K. e) The MAMSM model has three transformer encoder layers and each layer contains six head attention, how does these number determined? f) In Section 3.1, what is attention scores? There is no illustration when it firstly show up. g) What is the meaning of tokens in Section 2.2.3? Is it a scalar value in a fMRI time point? Or it is a vector? h) In Section 2.2.3 , how the original value (I think it’s a scalar) can be replaced by [mask]? i) What is the difference between discrete and continues masks? What is the meaning of CLS, M, SEP in token embedding in Fig. 1. k) The optimization function Eq. (6) is denoted by L2 norm, but all variables are matrices. The authors need to clarify the L2 norm and Frobenius norm.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method is novel and effective. The writting quality and experiments can be improved.

  • Number of papers in your stack

    2

  • What is the ranking of this paper in your review stack?

    7

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    7

  • [Post rebuttal] Please justify your decision

    All my concerns have been addressed in the rebuttal.



Review #4

  • Please describe the contribution of the paper

    This work proposes a self-supervised a Multi-head Attention-based Masked Sequence Model (MAMSM) as an embedding technique to identify the task-evoked brain networks and temporal features

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Contributions: 1). It is very interesting to introduce Natural Language Processing (NLP) techniques for revealing the time series and task-evoked brain networks from task-based fMRI.

    2). The methodological validation is presented with the other algorithms, using the identified task-evoked brain networks, time series, and intrinsic brain networks.

    3). This paper is written well.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1). Validations It would be biased to validate a supervised method with unsupervised methods, such as SDL and ICA;

    2). Discovery Limits Obviously, the authors employ an autoencoder ( a deep neural network) with embedding techniques to reveal the brain networks. The reviewers are very curious about the hierarchical structures identified by the proposed method.

    1. Some mistakes in mathematic formula

    In Eq (6), given all variables are matrices, it should be the Frobenius norm.

    1. The arbitrary comparison

    In Fig 4, the background of brain images is not consistent. For instance, the results of GLM mapping to the T1 weighted images but other background images are different.

    1. Some artifacts are reported in Fig 5(b).
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The reproducibility of this paper is limited.

    This work does not release the source code but applying a public data set.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Major Concerns:

    1). Validations

    The method proposed in this work is very novel. But the comparisons could be biased. For instance, the authors provide comparisons of the proposed method with SDL and ICA, which are canonical unsupervised methods. To further validate the proposed method, the author will be asked to validate the performance of the proposed method with other supervised methods.

    2). Limited contributions In this work, the authors only provide spatial similarity as the most important standard to evaluate the performances. Obviously, in Fig.1, (d) further training, the reviewers are very curious about the hierarchical organizations of brain networks. But, in the results section, there are only conventional task-evoked brain networks, without any hierarchical structures, reported and compared with peer methods. Here are some references for future validation: Sahoo, D., Satterthwaite, T.D., Davatzikos, C.: Hierarchical extraction of functional connectivity components in the human brain using resting-state fMRI. IEEE Transactions on Medical Imaging pp. 1–1 (2020). https://doi.org/10.1109/TMI.2020.3042873.

    3). Artifacts of identified brain networks In Fig. 5, the authors provide the results of the other detected brain networks compared to the ICA. The reviewers are afraid that some networks would be artifacts. For instance, in Fig. 5(b), an identified network in the second column and row seems to be an artifact that occupies a large area of white matter. Moreover, the reviewer noticed that the figures are generated arbitrarily in Fig 4. Specifically, the results of GLM are mapping with T1 weight images, but other results are mapping with different background images. Minor Concerns:

    1). Mathematical formula issues

    In Eq. (4), the authors need to provide more details of cos.

    In Eq. (6), the authors denote all variables as matrices, but the L2 norm is applied to generate the optimization functions. In fact, it should be using the Frobenius norm instead.

    2). Quantitative Comparison Issues In Figure 4, obviously, according to the color bar, the intensity of GLM results is larger than the proposed MAMSM. Therefore, the reviewers are very curious whether the spatial overlap can leverage the spatial and intensity influence.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    2

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. Poor validation. Again, it should be biased to validate the supervised model with unsupervised models.

    2. Some artifacts are reported as Identified Brain Networks. In Fig 5(b), the identified network at the 2nd row and 2nd column is an artifact.

    3. There is no consistency in qualitative comparison. For instance, in Fig. 4, the results are not mapped to the same background images.

  • Number of papers in your stack

    3

  • What is the ranking of this paper in your review stack?

    5

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    4

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    I agree that this paper proposed an interesting method, and applying the NLP approaches to fMRI data is reasonable. I would suggest the authors comprehensively address the concerns of all three reviewers.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5




Author Feedback

We are very thankful to all the reviewers’ constructive comments and their appreciation of our work.

Our responses to the reviewers’ common comments are itemized as follows. 1)As for the biased comparisons, Reviewer 2 and Reviewer 4 think it is biased to compare with the unsupervised methods such as SDL, and we should compare with other supervised methods. As shown in Fig.1(c), pre-training in MAMSM is a self supervised learning, if we compared them at this stage, SDL and our method are both unsupervised learning, so it is a fair comparison. This paper has compared the differences of Pearson’s correlation of latent features between SDL and our method. The latent feature (that is Average result shown in Fig. 1) obtained by pre-training of our method has achieved better effect than that by SDL. This can be seen from the combination of Table 1 and Table 2. Specifically, in Table1, the Pearson’s correlation (the 3rd row in Table 1) of Average result and task design curves are much larger than that (the 2nd row in Table2) of the features learned by SDL and task design curves, except for the cue task. In addition, in order to further improve the extracted features, we use a priori knowledge (task designs) to guide MAMSM to learn known patterns, in like a semi-supervised way, but the remaining patterns are still obtained by unsupervised learning, which are other FBNs in Secation3.2.2. 2)Reviewer 1 and Reviewer 2 point out that only 22 subjects are used in this paper. For MAMSM, one trip of training only extracts the features of one subject, for another subject, the model needs to be retrained, that is, each brain is learned separately to get its own features, and the model don’t need the more subjects to improve itself, and we use 22 individuals just because we want to validate the reproducibility of the model and get the group averaged FBNs. 3)As suggested by Reviewer 1 and Reviewer 2, the hyper-parameters’ tuning of the model is very crucial. The tuning experiments are not included in the paper due to the length limitation.

Our responses to the reviewers’ specific comments are itemed as follows. For Reviewer #1: 1)As mentioned in the paper, MAMSM learns features of fMRI signals in a self-supervised manner like BERT [18] in NLP, so there are only the training set, and there are no test & validation set. 2)Thanks for Reviewer’s constructive suggestion, the combination of spatial attention will further improve the performance of the model, but it’s not the focus of our current work. 3)Each value in fMRI time series is regarded as a word in a sentence and vectorized by token embedding step. So signals at each time point in the sequence are token embedded, rather than only including continuous and discrete masks.

For Reviewer #2: 1)Regarding ablation studies, although we did not compare FBNs with other supervised methods, we did ablation experiments in term of Pearson’s Correlation and Loss. For Pearson’s Correlation, the actual ablation comparison can be referred to the response 1) to the common comments. For Loss, it can be seen from Table 4 that if only Loss_mse is used, its cos-error is larger than that of adding Loss_cos to Loss_mse (Loss_task), i.e., adding Loss_cos is better. 2)The original signal value in an fMRI time point is scalar. We embedded it to a vector to express richer semantics by token embedding step.

For Reviewer #4: 1)Regarding hierarchical organizations of brain networks, the purpose of this paper is to infer the single-layer FBNs. Therefore, we only used one encoder layer and one decoder layer in the feature selection module. In the follow-up, we will deeply explore the application of MAMSM in hierarchical brain network. 2)As for the artifacts of identified brain networks, it is normal that the activated region lies in white matter. Please refer to [Gore, J. C., Functional MRI and resting state connectivity in white matter - a mini-review]. We regret using different backgrounds in Fig.4.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    I think this is a well presented paper and the authros have thoroughly addressed the major concerns of the reviewers.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors fully addressed the questions from the reviewers, and there is a consensus about the novelty of the proposed work.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    3



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Reviewers agreed that the proposed multi-head attention-based masked sequence model provides a novel and interesting approach to the discovery of brain functional networks that is relevant for the MICCAI audience. Initial reviews included some concerns about validation, but authors convincingly argue in their rebuttal that those were based on misunderstanding the self-supervised nature of their approach.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5



back to top