Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yiwen Ye, Yutong Xie, Jianpeng Zhang, Ziyang Chen, Yong Xia

Abstract

The universal model emerges as a promising trend for medical image segmentation, paving up the way to build medical imaging large model (MILM). One popular strategy to build universal models is to encode each task as a one-hot vector and generate dynamic convolutional layers at the end of the decoder to extract the interested target. Although successful, it ignores the correlations among tasks and meanwhile is too late to make the model aware' of the ongoing task. To address both issues, we propose a prompt-driven Universal Segmentation model (UniSeg) for multi-task medical image segmentation using diverse modalities and domains. We first devise a learnable universal prompt to describe the correlations among all tasks and then convert this prompt and image features into a task-specific prompt, which is fed to the decoder as a part of its input. Thus, we make the model aware’ of the ongoing task early and boost the task-specific training of the whole decoder. Our results indicate that the proposed UniSeg outperforms other universal models and single-task models on 11 upstream tasks. Moreover, UniSeg also beats other pre-trained models on two downstream datasets, providing the community with a high-quality pre-trained model for 3D medical image segmentation. Code and model are available at https://github.com/yeerwen/UniSeg.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43898-1_49

SharedIt: https://rdcu.be/dnwBJ

Link to the code repository

https://github.com/yeerwen/UniSeg

Link to the dataset(s)

https://competitions.codalab.org/competitions/17094

https://kits19.grand-challenge.org

http://medicaldecathlon.com

https://osf.io/t98fz/:

https://liuquande.github.io/SAML/

https://www.synapse.org/#!Synapse:syn25829067/wiki/610863

https://autopet.grand-challenge.org/Description/

https://www.synapse.org/#!Synapse:syn3193805/wiki/217789

https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=70229053


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a single model UniSeg that can deal with the segmentation of multi-organ segmentation from different datasets and modalities. The method is based on nnUnet and proposes a simple design of a universal prompt to adapt among different tasks. Experimental evaluations are conducted on 11 medical image segmentation datasets to demonstrate the performance of a single segmentation model on various tasks. Additional evaluations are conducted by applying the pre-trained models to fine-tune on two downstream tasks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The proposed design of a universal prompt is simple and cost-effective.
    • By using a single model, the proposed method is effective in different medical image segmentation tasks.
    • The paper is clearly written and easy to follow.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Some implementation details are unclear. For example, during the training, how to sample among datasets? Are all the comparison methods implemented using the same network backbone as the UniSeg? If not, how to ensure the comparison with previous methods is fair? DoDNet is based on 3D Unet in the original paper but UniSeg is based on nnUnet.

    • In Table 3, the numbers in green color are not easy to understand. What are the meaning of these numbers? Are different methods compared with different baselines?

    • The comparison to different variants of UniSeg lacks details of each variant, especially for the multiple prompts variant. It is unclear about how the multiple prompts are implemented and what the differrences are between the multiple prompsts variant and the UniSeg.

    • The universal models perform less effective on the two MR datasets than the single-task model, especially for the brain tumor segmentation on BraTS21.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Code and model will be released.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • The authors should improve the implementation clarity and explain the ablation study in more details.

    • It would be better if the results in Table 2 and 3 can be explained with more insights. Simply emphasizing that the method improves over the baseline is not very helpful.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper proposes an efficient single model that can perform segmentation on different datasets and modalities. The design is simple and effective.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors propose a learnable universal prompt that describes the correlations among all tasks, and then convert this prompt and image features into a task-specific prompt, which is fed to the decoder as a part of its input. By doing so, they make the model ‘aware’ of the ongoing task early and boost the task-specific training of the whole decoder.

    Their results show that the proposed UniSeg outperforms other universal models and single-task models on 11 upstream tasks, as well as beating other pre-trained models on two downstream datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) The universal prompt is designed for describing the inter-task correlations. This universal prompt is subsequently utilized to generate task-specific prompts for all the tasks under consideration. (2) In order to enhance the training of the decoder architecture, the incorporation of task-related prompt information and features works as input to the decoder. This allows for comprehensive training of the entire decoder rather than solely focusing on the last few layers.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) There is a lack of analysis on the features of task-specific prompts, including their characteristics and inter-relationships.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    thorough Implementation Details

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    A more comprehensive analysis of the task-specific prompt would significantly enhance the overall quality and depth of the investigation.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The efficacy of UniSeg is demonstrated across multiple tasks, highlighting its potential. However, there exists a paucity of detailed analysis pertaining to the learned features of the task-specific prompt.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    This paper presents a unified model that employs a universal prompt to address 11 segmentation tasks spanning various modalities and domains. The proposed universal prompt is designed to model the potential relationships between different tasks, fostering mutual benefits. Comprehensive experiments demonstrate the effectiveness of the proposed pipeline. Furthermore, the pre-trained model exhibits a high-quality representation capability, which positively impacts downstream unseen tasks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper proposes a novel prompt-driven universal segmentation model, which can flexibly optimize segmentation tasks from different fields through prompt learning, achieving the goal of universal segmentation. This idea is innovative and has certain inspirations for the research on medical foundation models.
    2. The experiment section is sufficient and convincing. A universal model is trained on 11 segmentation tasks from different tasks and modalities and significantly exceeds other competing SOTA universal methods and single-task methods.
    3. The paper rethinks the value of the universal model from another new perspective, representation learning, and validates its powerful representation ability through downstream transfer learning, which provides new inspiration for thinking about the universal model in the community.
    4. The paper has clear logic and well writing.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Discussions on future work need to be included.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    From the description and the response in the checklist, I believe this work is reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Certain details about some competing methods need to be explicitly stated. For example, what is the backbone of the Universal Model and DoDNet, and why not use deep supervision in Universal Model and DoDNet?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper addresses a promising and important topic in the field, has a reasonable design of the pipeline, and follows a logical progression in each step. The evaluation demonstrates the effectiveness of the proposed method. Therefore, I strongly recommend accepting this paper.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Based on the reviews provided, I highly recommend accepting this paper. The reviewers acknowledge the novel formulation of a universal prompt-driven segmentation model, UniSeg, which effectively addresses multiple segmentation tasks across different datasets and modalities. The proposed universal prompt captures inter-task correlations and boosts task-specific training, resulting in superior performance compared to other state-of-the-art universal and single-task models. The paper is praised for its clear writing and compelling experimental evaluations, which convincingly demonstrate the effectiveness of UniSeg on various medical image segmentation datasets. The reproducibility of the work is also deemed satisfactory, with plans to release code and models. While there are some minor weaknesses, such as the need for improved implementation clarity and further analysis of task-specific prompts, they do not significantly undermine the overall quality and contributions of the paper. Hence, the consensus among the reviewers is to strongly accept this paper for publication. The weaknesses should be carefully addressed in the final version.




Author Feedback

Thanks to both the AC and the reviewers for providing highly insightful comments. We greatly appreciate the feedback, as it will undoubtedly help us improve the quality of our paper.

Q1: Improved implementation clarity We acknowledge the need for improved implementation clarity and will ensure that the suggested details are included in the final version of the paper. Additionally, we will release the code to facilitate reproducibility and further assist researchers in understanding our proposed approach.

Q2: Analysis of task-specific prompts During the processing of each sample, we obtain all task-specific prompts, which amounts to eleven prompts for each sample. To analyze the relationships between these prompts, we calculate the cosine similarity between the task-specific prompt of the ongoing task and the prompts of the other tasks. By examining all the test sets, we generate an 11×11 cosine similarity matrix for the task-specific prompts of all tasks. Notably, we find that this matrix can be approximated as a symmetric matrix. This suggests that our UniSeg model has modeled stable relationships between different tasks that are less influenced by the input samples. We hypothesize that optimizing these relationships is a crucial factor contributing to the success of UniSeg’s joint learning approach.



back to top