Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yiqing Wang, Zihan Li, Jieru Mei, Zihao Wei, Li Liu, Chen Wang, Shengtian Sang, Alan L. Yuille, Cihang Xie, Yuyin Zhou

Abstract

Recent advancements in large-scale Vision Transformers have made significant strides in improving pre-trained models for medical image segmentation. However, these methods face a notable challenge in acquiring a substantial amount of pre-training data, particularly within the medical field. To address this limitation, we present Masked Multi-view with Swin Transformers (SwinMM), a novel multi-view pipeline for enabling accurate and data-efficient self-supervised medical image analysis. Our strategy harnesses the potential of multi-view information by incorporating two principal components. In the pre-training phase, we deploy a masked multi-view encoder devised to concurrently train masked multi-view observations through a range of diverse proxy tasks. These tasks span image reconstruction, rotation, contrastive learning, and a novel task that employs a mutual learning paradigm. This new task capitalizes on the consistency between predictions from various perspectives, enables the extraction of hidden multi-view information from 3D medical data. In the fine-tuning stage, a cross-view decoder is developed to aggregate the multi-view information through a cross-attention block. Compared with the previous state-of-the-art self-supervised learning method Swin UNETR, SwinMM demonstrates a notable advantage on several medical image segmentation tasks. It allows for a smooth integration of multi-view information, significantly boosting both the accuracy and data-efficiency of the model. Code and models are available at https://github.com/UCSC-VLAA/SwinMM/.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43898-1_47

SharedIt: https://rdcu.be/dnwBH

Link to the code repository

https://github.com/UCSC-VLAA/SwinMM/

Link to the dataset(s)

N/A


Reviews

Review #3

  • Please describe the contribution of the paper

    This paper presents a method to segment 3D medical images based on Swin Transformers and an approach using masked multi-view data yielding to a comprehensive multi-view pipeline for self-supervised segmentation. This method is evaluated on two public datasets. If masked networks and multi-view segmentation are not “new”, the combination of those two methods is a novelty.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well written, the contributions are clear. The method is reproducible The approach is interesting and with interesting novelty The study is complete and rich

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    To my opinion, the main weakness of this paper is the evaluation.I am a bit in trouble with the results:

    • there is no real statistical analysis
    • Table 3: authors say that their method overpassed the others, it is true in the globally but not if we look at each organ.
    • for ACDC: I am surprised with the results. The Unet clearly fails on this task, rising some questions on the way it has been trained. The results are clearly lower than the official results of the competition, why? And why isn’t there a comparison with the winning teams? On the official website, results are better than the one of the article.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The method is evaluated on public datasets, the source code is available (even if the main readme is quite empty, the source code is already downloadable)

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • The number of parameters is interesting, but the running time for training and inference is also important. Authors should add them to compare the methods.
    • The authors should add a statistical analysis
    • The authors should improve the result part with the comments made in the weaknesses section
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Even if I have some doubts/am not totally convinced by the evaluation part, the authors propose a fair article than can be easily improved to reach MICCAI standards

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper presents an innovative 3D medical segmentation technique called SwinMM, which is based on self-supervision pretraining. One of the unique aspects of this approach is the application of a masked multi-view encoder and a cross-view decoder to integrate multiple perspectives on 3D medical data. SwinMM achieved superior data efficiency and segmentation performance compared to state-of-the-art methods while requiring less computation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Technically sound. The paper introduces a comprehensive approach for utilizing multi-view information in 3D medical segmentation. This is achieved by combining a multi-view encoder, a cross-view attention module, and a multi-view consistency loss into a single unified framework.
    2. Novelty. The paper presents some interesting ideas, such as the use of mutual learning loss in the pretraining stage and multi-view consistency loss in the finetuning stage.
    3. Good experimental results. The paper demonstrated state-of-the-art performance on two benchmark datasets, showing impressive levels of data efficiency, computational efficiency, and model parameter efficiency compared to existing approaches. The experiments were conducted using ample datasets, including pre-training on five public datasets and testing on two additional public datasets.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Lack of ablation study. Firstly, the paper’s key contribution lies in effectively utilizing multi-view information for 3D medical segmentation. However, the study lacks an ablation analysis that evaluates related components such as the multi-view encoder, cross-view attention module, multi-view consistency loss, and mutual learning loss. Therefore, it would be helpful to see a detailed analysis of the effectiveness of these components in achieving the paper’s main objective.

    2. Missing referrences. To provide comprehensive information, it is recommended to cite and discuss the following two related papers on multi-view contractive learning for 3D medical applications:

    Mmgl: Multi-Scale Multi-View Global-Local Contrastive Learning For Semi-Supervised Cardiac Image Segmentation, ICIP 2022

    MVCNet: Multiview Contrastive Network for Unsupervised Representation Learning for 3D CT Lesions, IEEE Transactions on Neural Networks and Learning Systems 2022

    1. Paper writing. It is recommended to include markers in the tables of the paper to represent the best or second-best scores. Currently, the absence of these markers can be confusing for readers, particularly when comparing performance across large tables such as Table 2 and Table 3.

    2. Experimental analysis. It would be helpful to provide some failure case analysis for your method or a thorough analysis of situations where integrating multi-view information does not provide benefits. This will help users gain a better understanding of the limitations and applicability of the method.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper’s reproducibility is strong, thanks to the open-sourcing of code and the use of public datasets for all experiments.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    See the strengths and weaknesses discussed in above. Especially it would be beneficial for the authors to include (1) an ablation analysis related to the multi-view encoder, cross-view attention module, multi-view consistency loss, and mutual learning loss. (2) It would also be helpful to include some failure case analysis or a thorough analysis of situations where integrating multi-view information does not provide benefits. These additions can aid in better understanding the paper’s contributions and potentially improve its score.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is technically strong and showcases good experimental results. The introduction of novel loss functions also contributes to its strength. However, the main weakness lies in the absence of certain ablation studies. Despite this limitation, it may not be sufficient reason to reject the paper outright.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #4

  • Please describe the contribution of the paper

    The paper presents a novel approach for self-supervised medical image analysis, which leverages masked multi-view encoders during the pre-training step and a cross-view decoder during the fine-tuning step. The proposed method achieves state-of-the-art performance and demonstrates greater cost and data efficiency than existing models.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The authors have conducted a thorough set of experiments to evaluate the effectiveness of their method. They performed comparisons with popular SOTA networks, including supervised and unsupervised-based methods, and provided results on different datasets and metrics. The ablation study and semi-supervised setting analysis also help in understanding the contributions of different components and the data efficiency of the method.
    • The paper is well written, easy to follow, and provides a clear explanation of the proposed method, experimental setup, and results.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Although the ablation study is comprehensive in many aspects, some experiments appear to be incomplete. For example, the authors did not provide a clear rationale for selecting which proxy tasks to exclude from the “Pre-training loss functions” experiments. It would be beneficial to see results for different combinations of proxy tasks, including when only the contrastive learning task is used, to understand the individual contributions of each task.
    • The explanation of the validation strategy, specifically the five-fold cross-validation process, and how the ensembling was performed is unclear. Providing more details on the selection of the best model in each fold and the ensemble technique would help.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper appears to be reproducible. The link to the code is included.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The proposed method presents a promising approach for self-supervised medical image analysis with its state-of-the-art performance and improved cost and data efficiency. However, addressing the concerns mentioned above, such as clarifying the validation strategy and providing a more complete ablation study, would further strengthen the paper.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The comprehensive set of experiments and clear presentation of the method and results contribute to the overall impact of the work. Despite some areas for improvement, the paper’s strengths make it a valuable addition to the field.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Three reviewers have expressed positive feedback on this paper, acknowledging its technical innovation and experimental richness. Based on their feedback, I have decided to provisionally accept this paper. However, I strongly recommend that the authors take into consideration the shortcomings pointed out by the reviewers, specifically the deficiencies in ablation study highlighted by Reviewers 2 and 4. I encourage the authors to address these issues in the final version of the paper, which would greatly improve its overall quality.




Author Feedback

N/A



back to top