Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Xinyi Zeng, Pinxian Zeng, Cheng Tang, Peng Wang, Binyu Yan, Yan Wang

Abstract

3D Spatially Aligned Multi-modal MRI Brain Tumor Segmentation (SAMM-BTS) is a crucial task for clinical diagnosis. While Transformer-based models have shown outstanding success in this field due to their ability to model global features using the self-attention mechanism, they still face two chal-lenges. Firstly, due to the high computational complexity and deficiencies in modeling local features, the traditional self-attention mechanism is ill-suited for SAMM-BTS tasks that require modeling both global and local volumetric features within an acceptable computation overhead. Secondly, existing models only stack spatially aligned multi-modal data on the channel dimen-sion, without any processing for such multi-channel data in the model’s in-ternal design. To address these challenges, we propose a Transformer-based model for the SAMM-BTS task, namely DBTrans, with dual-branch architec-tures for both the encoder and decoder. Specifically, the encoder implements two parallel feature extraction branches, including a local branch based on Shifted Window Self-attention and a global branch based on Shuffle Win-dow Cross-attention to capture both local and global information with linear computational complexity. Besides, we add an extra global branch based on Shifted Window Cross-attention to the decoder, introducing the key and val-ue matrices from the corresponding encoder block, allowing the segmented target to access a more complete context during up-sampling. Furthermore, the above dual-branch designs in the encoder and decoder are both integrat-ed with improved channel attention mechanisms to fully explore the contri-bution of features at different channels. Experimental results demonstrate the superiority of our DBTrans model in both qualitative and quantitative measures. Codes will be released at https://github.com/Aru321/DBTrans.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43901-8_48

SharedIt: https://rdcu.be/dnwDW

Link to the code repository

https://github.com/Aru321/DBTrans

Link to the dataset(s)

https://www.med.upenn.edu/cbica/brats2021/


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a new transformer-based model for 3D multi-modal MR brain tumor segmentation based on dual-branch architectures for both the encoder and decoder parts that assemble two attention mechanism. The goal of the proposed method is to capture both local and global features from the images with linear computational complexity and to use them to achieve more accurate brain tumor segmentation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper presents an innovative transformer-based model for 3D multi-modal MR brain tumor segmentation. The model implements two parallel feature extraction branches to capture global and local image information. Despite its relative high complexity, the method is well described. The proposed dual-branch design in the encoder and decoder, integrated with improved channel attention mechanisms, enables the extraction of features at different channels. Although a more robust statistical analysis is necessary to fully assess the results, the proposed method appears to achieve similar or better performance when compared to state-of-the-art methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main weakness of this paper is that the average results were presented without showing the corresponding variances, and no statistical hypothesis test was used to determine the significance of the differences between the proposed and comparable methods using the Average Dice metric and Hausdorff distance. Due to the complexity of the model and the relatively small sample size (even with data augmentation), it would be indicated that the learning loss curves be provided to give a better understanding of the behavior of the proposed method during training and validation, and to assess whether the reported results are stable and reproducible. Additionally, it would be useful to perform a more thorough analysis of the results by including statistical tests and reporting confidence intervals or p-values, which would allow for a more robust comparison with the baseline methods.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Due to the inherent complexity of the proposed method, I would recommend providing the source code. Otherwise, it may be difficult to implement it from scratch.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Page 4 - Equations 1, 3 and 6 are too overloaded and need space between lines.

    Page 8 - Although Fig. 2 shows qualitative segmentation results of one MRI case, I believe the claim in the following sentence is too strong “Fig.2 also shows the qualitative segmentation results on the test samples of test set patients, which further proves the feasibility and superiority of our DBTrans model.”

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper proposes an innovative method to explore local and global multi-modal MR image information for the segmentation of brain tumors. The authors provided a good description of the proposed method but lacked of a more rigorous statistical analysis of the data. The training and validation loss curves, which can considerably help to indicate how well the model is fitting the training data, are missed.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    They propose a Transformer-based model for the SAMM-BTS task, namely DBTrans, with dual-branch architectures for both the encoder and decoder. Also they enhance the fusion effect of the two window-attention mechanisms as well as the multi-modal information from a global perspective.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1-due to the high computational complexity and deficiencies in modeling local features this proposed model achieved a good results. (major strength)

    2- The ablation study did to verify the contribution of each module. They observed that the dual-branch designs achieve higher performance while also reducing the number of parameters required. so, acceptable explanation of their different performance.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Proof reading by a native speaker will greatly help the paper.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The reproducibility of this paper is good. The description of the experiment is good.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    -More latest methods should be included and conducted in the comparison experiments.

    -The figure captions are written poorly. For example, “The overall framework of the proposed DBTrans” in Fig. 1 is not much helpful.

    • Figure for Segmentation accuracy could be better for visualization.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Well organized and clearly written.

    The effectiveness of proposed method has been well supported by experiments.

  • Reviewer confidence

    Somewhat confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #4

  • Please describe the contribution of the paper

    This paper proposes a novel encoder-decoder model for multi-modal medical image segmentation. Both encoder and decoder consist of two branches with deferent attention mechanisms.These mechanisms greatly enhance the ability of both local and global feature extraction.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper presents a new segmentation model with self- and cross-attention mechanisms, which will explicitly show which modality plays a decisive role in multi-modal MRI brain tumor segmentation. If this is true, the new method will greatly promote progress in the field of brain tumor MR image segmentation. Unfortunately, there are no experiments or conclusions in the paper indicating which modality on earth determines the segmentation results.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. There are confusing descriptions with wrong technical terms being used, such as the Shifted Window-based Multi-head Cross Attention (Shifted-W-MSA) and Shifted Window-based Multi-head Cross Attention (Shuffle-W-MCA) in the first paragraph on page 3.
    2. The authors say they provide specific processing for data with channel stacking to explicitly show which modality plays a decisive role in SAMM-BTS. However, I have not found which experiment supports this point.
    3. Considering the high demand for boundary segmentation accuracy in medical image segmentation, the new method has not shown significantly superior performance in Hausdorff Distance compared to SOTA methods.
  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The reproducibility of this paper is not very good unless the author clearly explains how to handle the relationships between modalities when splitting the embeddings.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The paper presents a novel dual-branch encoder-decoder model for multi-modal MRI brain tumor segmentation with self- and cross-attention mechanisms. These mechanisms greatly enhance the ability of both local and global feature extraction. The work is very meaningful and interesting, however, the following factors affect the readability of the manuscript and the effectiveness of the new method: 1.The describe in the first paragraph on page 3 (line 5 - line 9)is confusing. Is it the “Self” instead of “Cross” in line 5 since the mechanism is abbreviated as “Shifted-W-MSA”? And should the “Shifted” in line 6 be changed to “Shuffle”?

    1. The authors say they provide specific processing for data with channel stacking to explicitly show which modality plays a decisive role in SAMM-BTS. However, I have not found which experiment supports this point.It is necessary for the author to describe clearly how these multi-modal data are organized; When splitting the embedding e(i), is there a common modal data between e(i1) and e(i2)? If so, what is the difference between the new method and others which just stack the multi-modal inputs in the channel dimension? If not, what is the fundamental difference between these two types of organization of multi-modal data since the two branches take the same Q? Finally, the authors are suggested to compare the model’s performance under these two data organization manner. 3.Considering the high demand for boundary segmentation accuracy in medical image segmentation, the new method has not shown significantly superior performance in Hausdorff Distance compared to VT-Unet-B. Therefore, the author needs to verify the advantages of the new method in other aspects. For example, due to the dual branches, although the new method has fewer parameters than VT-Unet-B, it cannot guarantee that the model is easier to train than VT-Unet-B. If thus, can the weak performance advantage offset the additional training difficulty?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    My recommendation is weak reject. The authors have not provided experiments that can support the contribution of the new method.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    4

  • [Post rebuttal] Please justify your decision

    In rebuttal, the authors clarify that the channel-attention-based dual-branch fusion modules enabled the model to implicitly weigh the importance of different modalities in every stage, being conflicted with the description that the model “explicitly shows which modality plays a decisive role”.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This work has a mixed rate. Please prepare a rebuttal for addressing reviewing comments. The main rebuttal part include no statistical hypothesis test and variance, a lack of learning loss curves, a proof-reading by native speakers, adding more lastest methods, poor figure caption, visualization of the segmentation accuracy, no experiments to support the contributions of this method.




Author Feedback

Thanks to all the reviewers (R1, R3, R4, Meta-R) for the acknowledgment of our contributions and their constructive comments for further clarification. Q1: Lack of latest compared methods and no significant performance improvements (R1,R3,R4&Meta-R). A1: Thanks for the suggestions. As suggested, we further add two more SOTA methods on BraTS 2021, i.e., Swin-Unet (CVPR 2021, 86.73% mean Dice Score (mDSC) and 8.03 95% HD Hausdorff Distance (95%HD)) and NestedFormer (MICCAI 2022, 87.88% mDSC and 7.63 95%HD). As observed, our method still outperforms them by an average of 2.38% mDSC and 0.45 95%HD, further demonstrating its superiority. Moreover, to verify the significance of our improvements, we calculate the variances of all results and conduct statistical tests (i.e., paired t-test). The results show that p-values on Dice and 95%HD are less than 0.05 in most comparison cases, indicating that the improvements are statistically significant. All the results will be given in the final paper. Q2: Confusion about the process and contribution of multi-modal data. (R4&Meta-R). A2: Sorry for the confusion. We would like to clarify that our method and others both actually stack modalities at the input, but differ in the subsequent processing of the mixed modalities. Specifically, methods like TransBTS and UNETR just stack modalities and pass them through a sub-network, which treats each modality equally along the channel dimension and may ignore the contribution of different modalities. In contrast, our method applies channel-attention-based dual-branch fusion modules in every encoding and decoding stage, enabling the model to implicitly weigh the importance of different modalities in every stage. Therefore, we would like to claim that exploring the contribution of different modalities is mainly undertaken by the channel-attention-based fusion module, and has nothing to do with the embedding splitting operation. Furthermore, the performance gain of our method over compared methods without further processing for the mixed modality, as well as the boost brought by the fusion module in the ablation study, could prove that our design for multi-modal data processing does work. To further avoid confusion, we will revise the statements in the introduction and provide a more experimental analysis of the contribution of the four modalities in our final paper. Q3: Stability/Feasibility of training and reproducibility of results (R1&R4&Meta-R). A3: Thanks for the comments. We have analyzed the learning loss curves and find that the training and validation losses drop rapidly in the first 10-20 epochs and then stabilize at a relatively low value, while the performance steadily improves. Besides, our method has relatively few parameters among all compared methods, only 4M more than the second-best VT-Unet-B. These two aspects show that our method is easy to train in terms of both training stability and efficiency. Moreover, as we follow the common practice and report the average metrics over 3 runs, we find that the results are barely consistent among different runs, which demonstrates the high result reproducibility of our model. In our final version, we will provide the learning loss curves and our code link. Q4: Poor figure captions and visualization of segmentation accuracy (R3&Meta-R). A4: Sorry for these problems. We will modify the caption of Fig.1 by providing a more detailed description of the overall framework and modify Fig.2 by adding captions of different tumor regions. Q5: Confusion about the inconsistent abbreviations of “Shifted” and “Shuffle”(R4). A5: Sorry for the confusion. These terms should be corrected as “Shifted Window-based Multi-head Self Attention (Shifted-W-MSA) and Shuffle Window-based Multi-head Cross Attention (Shuffle-W-MCA)”. Lastly, we’ll invite some native speakers to proofread the paper for better understanding and show more qualitative segmentation results to consolidate our statements in the experiment section.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Two reviewers are positive to accept it. After checking the rebuttal,the rebuttal has addressed many reviewer concerns. I think this work can be accepted.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper proposed a heavy network, a dual-branch ViT, for multi-modal brain tumor segmentation. The reviewers are mixed on this paper and mentioned some concerns and issues during reviewing. The rebuttal has partially addressed the questions raised by the reviewers. I’m quite mixed on this paper. On one side, the proposed idea is somewhat interesting and makes sense. On the other side, the model is quite heavy, which may affect the inference speed and also the requirement of the training samples. The performance w.r.t. Hausdorff distance is not that much.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The author addresses my concerns. The authors further elaborated the multimodal data structure, added new comparison methods, and added more comparison indicators to increase its credibility. I think the article is weakly accepted



back to top