Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Zhaohu Xing, Lequan Yu, Liang Wan, Tong Han, Lei Zhu

Abstract

Multi-modal MR imaging is routinely used in clinical practice to diagnose and investigate brain tumors by providing rich complementary information. Previous multi-modal MRI segmentation methods usually perform modal fusion by concatenating multi-modal MRIs at an early/middle stage of the network, which hardly explores non-linear dependencies between modalities. In this work, we propose a novel Nested Modality-Aware Transformer (NestedFormer) to explicitly explore the intra-modality and inter-modality relationships of multi-modal MRIs for brain tumor segmentation. Built on the transformer-based multi-encoder and single-decoder structure, we perform nested multi-modal fusion for high-level representations of different modalities and apply modality-sensitive gating (MSG) at lower scales for more effective skip connections. Specifically, the multi-modal fusion is conducted in our proposed Nested Modality-aware Feature Aggregation (NMaFA) module, which enhances long-term dependencies within individual modalities via a tri-orientated spatial-attention transformer, and further complements key contextual information among modalities via a cross-modality attention transformer. Extensive experiments on BraTS2020 benchmark and a private meningiomas segmentation (MeniSeg) dataset show that the Nest- edFormer clearly outperforms the state-of-the-arts. The code is available at https://github.com/920232796/NestedFormer.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16443-9_14

SharedIt: https://rdcu.be/cVRys

Link to the code repository

https://github.com/920232796/NestedFormer

Link to the dataset(s)

https://www.med.upenn.edu/cbica/brats2020/data.html


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper has presented transformer network for multimodal medical data like Brain MRI dataset. The presented Nested Modality Aware Transformer (NestedFormer) approach is capable to handle multimodal information in the dataset. The same is validated with the experimental results with BRATS and Meniseg dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is well written.
    2. The results are presented on two different brain segmentation dataset.
    3. The proposed NestedFormer is discussed in detail.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. There should be section or subsection to discuss key highlights of proposed architecture which make it better than other SOTA.
    2. There should be thought presented on extension of the architecture to any other medical dataset where multiple modalities are available.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The work is reproducible

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. There should be section or subsection to discuss key highlights of proposed architecture which make it better than other SOTA.
    2. There should be thought presented on extension of the architecture to any other medical dataset where multiple modalities are available.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    In general, medical data have huge information in form of modalities and this is current area of research to take benefit of modalities together. The paper presented a new mmFormer architecture which can tackle this issue well.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    This paper proposes a model, termed NestedFormer, to fuse multi-modal information based the architecture of transformer. The multi-model information is firstly embedded separately using Global Poolformer. The core module of NestedFormer, called Nested Modality-aware Feature Aggregation (NMaFA), is proposed to fuse long-range dependencies of different modalities. A modality-sensitive gating (MSG) is proposed to utilize modality-aware low-resolution features by decomposing in three orientations.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This paper is overall well written and easy to follow.
    2. The usage of transformer is relatively novel and could be of interest to many readers.
    3. NestedFormer provides an alternative and new approach for multi-modality fusion, which still an active research problem, and look promising.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. If I do no understand wrong, the design of NMaFA is similar to channel and spatial attention network, which is widely adopted in the literature, but the authors do not mention that. Maybe the authors can further justify the differences.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    1. The method proposes quite a few unique blocks in Nestedformer, making it hard to implement. The author could consider to provide the model source code to help reproduce the results.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. I suggest to provide an nnU-Net baseline, which can be done out-of-box, to help the reader access the relative performance gains.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, I think the method is novel. The method and experiments are well demonstrated. Although the method is relatively complicated, but a clear ablation study makes it easy to follow.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Somewhat Confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    6

  • [Post rebuttal] Please justify your decision

    I read through the authors’ reponse. Here is my thoughts:

    1. The authors demonstrate the differences between their methods and attention based methods. I feel a little hard to understand the exact differences but it woud be great if the author can provide this in the final version.
    2. I think the nnunet baseline make the results look more solid.
    3. I went through the code in the supplementary material which contains the basic blocks. I would sugguest the authors to provide the whole training/test code to facilitate reproduction. Overall, I think this paper explores novel ideas with traditional tasks, showing promising results and insights. I would retain my assessment and recommend acceptence.



Review #4

  • Please describe the contribution of the paper

    In this paper, the authors propose NestedFormer that combines U-Net and Transformer for brain tumor segmentation. The effectiveness of NestedFormer is demonstrated through performance evaluation experiments using the BraTS2020 dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Proposal of a new combination of UNet and Transformer.
    • Comparison with the latest methods such as TransBTS and Unetr.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • There are some parts where explanations are insufficient.
    • There is insufficient discussion of the experimental results.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    If one reads this paper carefully, it is possible to implement NestedFormer. It is possible to perform experiments with BratS2020, but not with MeniSeg.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. In the encoder you named the feature embedding layer, why not 3D conv? 3D conv is easier to understand in terms of feature extraction by the encoder.

    2. For reproducibility, the version of the library used should be specified.

    3. Why did the authors not perform cross-validation in their experiments with BraTS2020? The dataset may be randomly divided, but any bias may not result in a correct evaluation.

    4. The results in Table 2 show that the NestedFormer has a larger HD95 value. There is no discussion of the results with higher accuracy in terms of Dice and lower accuracy in terms of HD95.

    5. Why is it that only limited combinations were evaluated in the ablation study in Table 3?

    6.The font size in Figure 2 is small and the resolution is low. Figure 3 caption is incorrect. Figures and tables must be placed on or after the page on which they are referred to.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    My judgment is borderline, but the paper needs to be revised, so I decided to give it weak reject.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    I have carefully checked the authors’ rebuttal. This rebuttal gave me some understanding, although I still have concerns about the split of the dataset. Based on the above, I rate this paper as weak accept.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper presents an architecture named NestedFormer that combines U-Net and Transformer for brain tumor segmentation. The way of combining UNet and Transformer is novel and its effectiveness is demonstrated by its application on the BraTS2020 dataset. The comparative study is of high quality because it uses the most recent methods such as TransBTS and Unetr. To improve the clarity of the article, some critical points that authors should address in a rebuttal are listed below:

    • The paper needs a discussion on the difference between a channel and spatial attention network and the proposed NMaFA module.
    • The authors should give more discussion on the results of tables 2 and 3, in particular why only limited combinations were evaluated.
    • Authors should explain the main strengths of the proposed architecture that make it better than other SOTA methods.

    The authors are also invited to make the corrections indicated by the reviewers.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    3




Author Feedback

We appreciated the favorable comments on our important problem setting (R3), the novelty of our method (AC, R3), high-quality comparative study (AC, R4), and clear writing (R1, R3). Below, we clarify the main issues raised by reviewers.

AC Q1: Difference between our framework vs. channel-spatial attention network Channel-spatial attention network is to reweight feature maps channel-wise and spatial-wise. However, in our NMaFA, the cross-modality attention is used to compute the global relation among different modalities to achieve inter-modality fusion, while our spatial attention computes the long-range correlation between different patches in the space within each modality, which is different from channel-spatial attention. Moreover, our NMaFA relies on transformer mechanism and the two transformers are fused in a nested form, rather than serial (Khanh et al. [Applied Sciences 2020]) or parallel (Mou L et al. [MICCAI 2019]) fusion, as most channel-spatial attention networks do.

Q2: Limited combinations in ablation study (Table 3); our HD95 in Table 2 is not the best -Sorry that Table3 may lead to some misunderstandings. Here, CNN, PB, and GPB are encoder backbones and only one can be used for each setting. We first compare CNN and GPB, showing that our improved GPB works better. Taking GPB as the backbone, we then study fusion modules, T_tsa, T_cma, and MSG, showing that each module has a certain improvement. We also testify PB as the backbone while using all fusion modules, and GPB still gets better results. -HD95 is for the distance difference between two sets of points, which is more sensitive than Dice. There is also a higher HD95 in previous SOTA work (e.g., Wang et al. [MICCAI 2021]). Hence, we use Dice as the main metric and HD95 as the reference. 

Q3: Main strengths of our architecture compared with SOTA

  • Fusion strategy. Many SOTA methods fuse multimodal images at the input level and cannot fully mine the multimodal information. Instead, we fuse multimodal features by explicitly considering both single-modality spatial coherence and cross-modality coherence; and design nested transformers to establish the coherence in the long-range, resulting in more effective feature representation.
  • Feature selection. For the skip-connections, we design a novel MSG module to dynamically select modality-sensitive features, thus improving the feature reuse effect.
  • Feature encoder. GlobalPoolformer is developed to model the global dependencies.

R1 Please also refer to ACQ3. Q4: Extension to other multi-modal medical data Our framework is modality-agnostic and can be extended to other multimodal medical data.

R3 Please also refer to ACQ1. Q5: Source code We have submitted the source code in supp material and we will share the code on Github.

Q6: The nnUNet baseline Following the same experimental setting in Table 2, nnUNet have results (WT: 0.907, 6.94; TC: 0.848, 5.069; ET: 0.814, 5.851), which are lower than ours.

R4 Please also refer to ACQ2. Q7:Why not use 3D Conv encoder Recent works show that transformer is more conducive to modeling global information than CNNs. In ablation study, we testify 3D Conv and our proposed GlobalPoolformer is better than 3D Conv (Avg Dice: 0.75 vs. 0.74), so we use the GlobalPoolformer as the encoder by default.

Q8: Conduct cross-validation in BraTS2020 The experimental setting of BraTS2020 follows (Larrazabal et al. [MICCAI 2021]) and (Hatamizadeh et al. [arXiv 2022]). Due to time limit, we performed two-fold cross-validation for several methods: UNETR (WT: 0.876+-0.006, 3.822+-0.205; TC: 0.793+-0.014, 3.916+-0.389; ET: 0.735+-0.007, 3.407+-0.264) TransBTS (WT: 0.877+-0.001, 8.287+-1.267; TC: 0.790+-0.006, 10.039+-1.518; ET: 0.718+-0.021, 7.520+-2.487) NestedFormer (WT: 0.891+-0.001, 2.922+-0.289; TC: 0.804+-0.009, 4.345+-0.175; ET: 0.732+-0.009, 4.538+-0.020) Our method outperforms the two methods in WT and TC, and is quite close to the best result in ET.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper presents an architecture named NestedFormer that combines U-Net and Transformer for brain tumor segmentation. The way of combining UNet and Transformer is novel and its effectiveness is demonstrated by its application on the BraTS2020 dataset. The comparative study is of high quality because it uses the most recent methods such as TransBTS and Unetr. The authors’ responses in the rebuttal help clarify main critical points. My proposition is therefore “acceptance”.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    2



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Authors addressed most of the reviewers’ concerns, including modified experiments as suggested by R3. The methodology proposed in the paper to integrate multi-modality information across different feature levels and scales together with a spatial attention within a transformer architecture is clearly novel, albeit the exact differences between the proposed approach and attention based methods could be better delineated in the discussion. Results show clear performance improvements over other methods, albeit a discussion of why the method provides better accuracy using DSC but a worsening of accuracy using HD95 measure is warranted. Overall, this is a well-written and excellent manuscript.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    1



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This work received initially fairly positive comments, with mostly clarification concerns raised during the review process. After reading carefully the reviewers’ comments, and the rebuttal I think that authors satisfactorily addressed most of these comments. Furthermore, I read the paper and I side with the general perception that the technical contributions and supporting empirical validation is sufficient to be accepted at MICCAI.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    1



back to top