Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Zizhao Sun, Huiqin Jiang, Ling Ma, Zhan Yu, Hongwei Xu

Abstract

Most of the existing multi-view mammographic image analysis methods adopt a simple fusion strategy: features concatenation, which is widely used in many features fusion methods. However, concatenation based methods can’t extract cross view information very effectively because different views are likely to be unaligned. Recently, many researchers have attempted to intro-duce attention mechanism related methods into the field of multi-view mammo-graphy analysis. But these attention mechanism based methods still partly rely on convolution, so they can’t take full advantages of attention mechanism. To take full advantage of multi-view information, we propose a novel pure transf-ormer based multi-view network to solve the question of mammographic image classification. In our primary network, we use a transformer based backbone network to extract image features, a “cross view attention block” structure to fuse multi-view information, and a “classification token” to gather all useful information to make the final prediction. Besides, we compare the performance when fusing multi-view information at different stages of the backbone network using a novel designed “(shifted) window based cross view attention block” structure and compare the results when fusing different views’ information. The results on DDSM dataset show that our networks can effectively use multi-view information to make judgments and outperform the concatenation and convolu-tion based methods.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16437-8_5

SharedIt: https://rdcu.be/cVRsR

Link to the code repository

N/A

Link to the dataset(s)

http://www.eng.usf.edu/cvprg/Mammography/Database.html


Reviews

Review #1

  • Please describe the contribution of the paper
    1. This paper designed a multi-view network based entirely on transformer architecture. The used “cross view attention block” can work better in a pure transformer style.
    2. This paper introduced a learnable “classification token” into the network. This token can gather all useful information to make better prediction.
    3. This paper designed “(Shifted) Window based Cross View Attention Block”. This structure can fuse cross view information anywhere in the network with low computational cost.
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper proposes a new method for classification of mammography images.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1.Literature research is insufficient. The novelty of the proposed method needs to be explained. 2.The network process description is not clear enough. How do the three sub-networks evaluate the results as a whole? 3.In the experimental part, the method in this paper should be compared with the methods mentioned in the introduction to demonstrate the effectiveness of the method. 4.Figure 2 does not have acceptable image quality. 5.The English expression of the article is very poor.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The method proposed in the article has good reproducibility, and the referenced modules are described in more detail in the article.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    1.The English expression of the article needs to be improved. 2.The abstract should state the novelty of our method and why it is useful. 3.The article needs to strengthen the logical ordering, and clearly describe the algorithm process and working principle. 4.The article needs to add relevant experiments to prove the superiority of this method.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    1.The literature survey is not sufficient to verify the novelty of the method proposed in the article. 2.The logic of the method part is not clear, and the experimental part is not perfect. 3.The English expression of the article needs to be improved.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    In the rebuttal letter, the author better revise the question raised. The author strengthens the description information, expands the experimental content, and improves the picture quality and language expression.



Review #2

  • Please describe the contribution of the paper

    This paper introduces several cross-view attention mechanisms to learn representations for multi-view images (i.e., mammographic images). This paper introduces: (1) a “cross-view attention” to aggregate information over multiple views. (2) a learnable “classification token” to make better predictions. (3) a “shifted window-based cross-view attention block” for saving computational cost.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Cross-view attention is interesting.
    2. The proposed model performs the best on the DDSM dataset (Table 2).
    3. This paper provides empirical studies on the fusing stages (Table 1).
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Key details are not explained well. For example, this paper does not provide details (e.g., equations or figures) for the proposed shifted window-based cross-view attention block.
    2. Ablation study. This paper does not provide the experimental results of Swin-T with feature concatenation. Therefore, we are unable to directly compare the proposed cross-view attention with the simple concatenation. As a result, we are unable to know how much improvement does the proposed cross-view attention bring.
    3. On page 5, there are some issues with the fonts. For example, Q, K, V, \alpha, \beta in the last paragraph.
  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Reproducibility looks good.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    In general, the idea is interesting. However, more detailed descriptions should be provided for the shifted window based cross view attention block. More ablation studies should be provided, such as (1) the experimental results for Swin-T (feature concatenation) should be provided; (2) shifted cross-view attention vs cross-view attention.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    My major concerns are (1) the description of the shifted window based cross-view attention block is not quite clear, which is a major component of the model; (2) without the results of Swin-T (feature concatenation), we are unable to know how much improvement does the proposed cross-view attention could bring.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    4

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    This paper proposes to use transformer architecture for multi-view mammographic classification, to more effectively use the cross view attention mechanism. A classification token is introduced, and experiments show that the proposed approach outperforms feature concatenation based approaches and CNN based cross view attention approaches, on malignancy classification task.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed pure transformer based multi-view network seems to be novel. Intuitively, it can better utilize cross view attention mechanism, which is verified by the experiments. Previous multi-view approaches rely on hand crafted attention blocks, which are kind of ad hoc.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    It would be interesting to compare the performance on other tasks as well, such as lesion detection tasks.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    No code is given, but the description should be clear enough to reproduce.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Adding experiments on other tasks can further strengthen the results.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Using transformer for multi-view mammographic analysis is natural and seems to be novel.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    There is some recognition of the paper, but the reviewers also have the concerns on the details and novelty of the methods as well as the experimental validations.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6




Author Feedback

To Reviewer #1 & #2: Thanks for your comments! Q1: Lack of experiments. A1: We have done the experiments when using concatenation based method to fuse multi-view information with Swin-T backbone. Its results are: 0.834±0.003(view-wise), 0.743±0.002(breast -wise), 0.835±0.006(joint). We can see that using cva get higher auc than using concatenation. But it still changes our conclusions a little bit. We will add new experimental results and refine our conclusions in the final version.

To Reviewer #1: Thanks for your comments! Q1: The English expression of the article is very poor. A1: We will try our best to improve the English expression in the final version. Q2: The abstract should state the novelty of our method and why it is useful. A2: In the current version of the article, the abstract section focuses on the brief introduction of our work and the novelty of our method is emphasized in the related work section. We will optimize the abstract section in the final version. Q3: Poor description of network process and working principle. A3: In fact, the division of the three fusion strategies is modeled on reference[1] (a paper of high quality). Due to space constraints, we don’t cover all three networks in detail. But we briefly explain it and cite the relevant literature in the first paragraph of chapter three. We think this partitioning helps us better understand which cross-view information is more useful for this classification task. Q4: The article needs to add relevant experiments to prove the superiority of this method. A4: In the final version of our paper, we will include the results of the following five types of networks: (1) network using concatenation based method to fuse multi-view information with resnet-50 backbone. (2) network using cross-view attention mechanism to fuse multi-view information with resnet-50 backbone. (3) network using concatenation based method to fuse multi-view information with Swin-T backbone. (4) network using cross-view attention mechanism to fuse multi-view information with Swin-T backbone. (5) network in (4) with classification token. On the basis of these results, we can complete the comparison between concatenation and cross-view attention, comparison between pure Transformer structure and non-pure Transformer structure, comparison between networks with classification token and without classification token. Q5: Poor image quality. A5: We will replace this poor quality image with a higher quality image in the final version.

To Reviewer #2: Thanks for your comments! Q1: Key details for the shifted window-based cross-view attention block. A1: Due to space constraints, we have reduced some content. However, we have marked the references where we have reduced content. The (shifted) window based cross view attention block is similar with (shifted) window based self attention block proposed in reference[2]. Replace all MSA operations in (S)W-SAB with MCVA operations and guarantee that the two views are divided into windows and the resulted windows are shifted in the same way, the (S)W-SAB will become (S)W-CVAB. Q2: Lack of experiment ( SW-CVAB vs CVAB ). A2: We think your opinion is very valuable, but adding this experiment in this paper will take up a lot of space and there is not enough space in the paper. After much consideration, we decided to include this experiment in our follow-up study. Thanks again for your comments! Q3: Issues with the fonts. A3: We will adjust the problematic fonts in the final version.

To Reviewer #3: Thanks for your comments! Q1: Experiments on other tasks. A1: In our future work, we will investigate how this method can be applied to other tasks and conduct experiments to discuss its performance on corresponding tasks.

[1]Deep Neural Networks Improve Radiologists’ Performance in Breast Cancer Screening. IEEE Trans. Med. Imaging 39, 1184–1194 (2020). [2]Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In: ICCV (2021).




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The main concerns that the paper is weak in innovation and logical organization of the method remain.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    11



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper proposes to use transformer architecture for multi-view mammographic classification, to more effectively use the cross view attention mechanism. The authors addressed the comments of reviewers well, including “Lack of experiments”. All reviewers admit the novelty of this paper. In the preparation of the final version, the author should add more details of this experiment, the methods and proof read the whole paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    3



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper presents a multi-view Transformer method for classifying mammography images, which all reviewers agree is a novel method in the field. In the rebuttal, the authors address several key issues raised in the initial review such as missing results with T-swin and the explanation of shifted cross-view attention. In the final version, please carefully address the technical innovation by adding more literature review and discussion with related works, improve the organization of the methodology, the language as well as other minor issues.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    2



back to top