Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Meiyu Li, Kaicong Sun, Yuning Gu, Kai Zhang, Yiqun Sun, Zhenhui Li, Dinggang Shen

Abstract

Early detection and diagnosis of breast cancer using ultrasound images are crucial for timely diagnostic decision and treatment in clinical application. However, the similarity between tumors and background and also severe shadow noises in ul-trasound images make accurate segmentation of breast tumor challenging. In this paper, we propose a large pre-trained model for breast tumor segmentation, with robust performance when applied to new datasets. Specifically, our model is built upon UNet backbone with deep supervision for each stage of the decoder. Be-sides using Dice score, we also design discriminator-based loss on each stage of the decoder to penalize the distribution dissimilarity from multi-scales. Our pro-posed model is validated on a large clinical dataset with more than 10000 cases, and shows significant improvement than other representative models. Besides, we apply our large pretrained model to two public datasets without fine tuning, and obtain extremely good results. This indicates great generalizability of our large pre-trained model, as well as robustness to multi-site data. The code is publicly available at https://github.com/limy-ulab/US-SEG.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43990-2_9

SharedIt: https://rdcu.be/dnwLj

Link to the code repository

https://github.com/limy-ulab/US-SEG

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose a deep learning model for breast tumor segmentation on ultra sound images. The neural network is inspired by U-Net where the intermediate layers at each state of the decoder is supervised using GAN based methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strengths of the paper are use of large datasets for testing the generalizability of the model. The authors also present a comparison with other state of the art segmentation tools.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I think the equation can be written properly it is difficult to read in the state it is now. There are some typos with the subscripts in the fourth line on page 3. The GAN formulation can be better explained. So, I will ask the authors to work on section 2 (Methods) to make sure that the equations are correctly represented

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors mentioned that the code is available on github. So, I will assume it is reproducible. I think that the results will be repeatable in new dataset as it was trained on a relatively large dataset.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    I think the paper reads well. I had some difficulty in section 2 following the equations. I will also ask the reviewers to better explain the loss functions. However, I think the authors have done a great job with the dataset and testing the method across different methods. I will also ask the authors to add the github links if they have used the source code from other implementation for testing. If they have implemented the other networks on their own I hope the github will show that.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I think the results are compelling and would definitely be a candidate for acceptance. I was also pleased to see the ablation study and comparison with other methods. The paper is very thorough with their method and comparison with other methods. I think the use case is also very relevant for clinical adoption.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The authors developed a deep-learning breast tumour segmentation model based on GAN by adopting multi-scale features collected from ultrasound images. The network was trained with 10927 cases collected from multiple Hospitals. The model was evaluated on two additional public datasets without fine-tuning, showing an improvement compared to the state-of-the-art.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The dataset used for training the proposed model is big enough to have an accurate segmentation model for breast cancer using BUSI.
    • The paper is easy to understand.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The technical contribution is limited.
    • Adopting the GAN using a multi-discriminator is not novel and is used a lot in computer vision literature.
    • Using multiple discriminators in a GAN can offer benefits such as improved performance and increased stability. Still, it also introduces additional challenges and complexities that must be carefully considered when designing and training the network, and the authors did not assess these difficulties in the experimental section.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • Detailed information about the methodology is missing, including the size of the input and output of each stage and layer.
    • The data used in the paper is from two well-known public repositories for testing, including the segmented masks and the bounding box outlining the regions of interest. However, the information about the data used for training is missed, although the number of cases used for the model training is big enough.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • The abstract has to be changed by including the contributions of the work.
    • Using multiple discriminators can also make the training process more difficult. It can be challenging to balance the training of each discriminator with the generator and ensure that they all contribute equally to the network’s overall performance. Multiple discriminators can increase the risk of overfitting the network to the training data. If the discriminators are too specialized, they may only learn to recognize specific training data features and not generalize well to new data, opposite to what the authors said in the abstract. The authors should show the change in the loss of each discriminator and the total loss of the GAN network during the training and validation stages.
    • Using multiple discriminators can also reduce the diversity of the generated samples. If the discriminators are too similar, they may all learn to recognize the same features and produce similar output, resulting in less diverse generated samples. Perhaps the authors can use the multi-features extracted by one discriminator to compare the GT and the predicted mask.
    • In the abstract, the authors mentioned that “This indicates great generalizability of our large pre-trained model,..”, but they did not prove that in the experiments.
    • It is essential in any medical method to show that the limitation of the work is very promising, given that when reading the work, we can glimpse a way to get around these limitations.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The technical contribution is limited. In my point of view, the modification is a heuristic and the article may be needed to change the objective to optimize. The clarity of the paper is not enough

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The rebuttal letter is clear, and the authors answer my doubts. The paper can be accepted but after some updating. For example, in Experimental Results, the authors need to start with the ablation study and show each module’s effect on the performance of the proposed model. Then the authors have to summarise the best combination of the modules and backbones for the proposed model. Afterwards, a comparison to the state of the art methods should be provided. For experimental results, the authors must apply an ablation study to elevate the effect of the number of discriminators on the model performance. The authors must clearly explain the figures and equations symbols for the Methodology section.



Review #3

  • Please describe the contribution of the paper

    This article presents a large pretrained deep learning model for breast tumor segmentation in ultrasound images. The proposed model is based on a UNet backbone with deep supervision at each stage of the decoder. Additionally, multiple discriminator models are designed to penalize distribution dissimilarity across multiple scales. The model is evaluated on a large clinical dataset with over 10,000 cases and compared with state-of-the-art models. It is also applied to two public datasets without fine-tuning to assess its generalizability and robustness to multi-site data. The experiments show good results on the three datasets, and ablation studies were conducted to demonstrate the inner workings of the method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    • This paper can be considered as a one more step towards the first large pretrained models in the medical field • A very large breast ultrasound dataset was used in this study • The incorporation of deep supervision at each stage of the decoder backbone is improving the accuracy of segmentation by providing additional guidance and feedback during the training process.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    • The scientific novelty in this paper is limited since the idea of using an adversarial discriminator to boost the segmentation outputs has been extensively studied in the medical field (e.g. https://arxiv.org/pdf/1908.09298.pdf). • No analysis performed on the impact of the 4 discriminator models on the obtained results. • The large pretrained model is not evaluated on the BUSI dataset, which is the most common dataset used in the field. • The experimental results are insufficient because the authors have not conducted any analysis on normal images. • A large pretrained model should perform wisely on the separation between malignant and benign cases. The authors have not conducted any analysis on malignant/benign cases.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Based on the information provided, nothing to note in this section

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    • It would be interesting to present the inference time for the SOTA and the proposed model because choosing the most feasible model is always a trade-off between performance and efficiency. • A Recent paper tackling this topic did not use the same datasets. “Thomas, C., Byra, M., Marti, R., Yap, M. H., & Zwiggelaar, R. (2023). BUS‐Set: A benchmark for quantitative evaluation of breast ultrasound segmentation networks with public datasets. Medical Physics. » To validate the efficiency of a large pretrained model, one should use all the public datasets and to compare the results with recent studies. • No experiments were done using simple regularization techniques, like L1-norm on the segmentation outputs, in order to prove the impact of the discriminators.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    1) No scientific novelty presented in this paper 2) The claim of presenting a large pretrained model for breast ultrasound segmentation lacks sufficient evidence or is not adequately supported by the performed experimentations. This model should be evaluated on the well known datasets available in the field and this is not the case in the paper.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    3

  • [Post rebuttal] Please justify your decision

    The authors’ comments are clear, but the current state of the paper does not provide enough evidence to support the assertion that a large pretrained model for ultrasound breast segmentation is ready to be explored by the research community or to be used in a clinical setting. Further experimentations and additional evidence are necessary to validate the claims put forth by the authors.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper utilizes large annotated dataset for pretraining a breast tumor segmentation network, which is an important attempt in developing foundational models for medical image analysis.




Author Feedback

We thank the reviewers and the meta-reviewer for their important comments and constructive suggestions. In rebuttal, we will address the reviewers’ concerns on the novelty of our pre-trained model and provide more reasonable evaluation of its performance.

  1. Scientific novelty of the proposed model We aim to develop a foundational pre-trained model for breast tumor segmentation using ultrasound images, which is robust enough to be applied to unseen clinical datasets. The main contribution of this work is that we have developed a robust and generalizable segmentation network based on multi-level discriminators, which is trained and validated on a large annotated dataset with more than 10000 cases. The inner ablation studies and comparison results with state-of-the-art methods show superior segmentation performance. More importantly, we applied the pre-trained model on unseen ultrasound data from different sites without fine-tuning and obtained remarkable performance and outstanding generalizability. This demonstrates the potential of directly applying our model in real clinical applications.
  2. Clarification on multi-site data Our preliminary intention of this work is to build a robust and generalizable pre-trained segmentation network using a very large annotated dataset. The proposed network is validated on a clinical ultrasound dataset and two public datasets. Experiment results show our large pre-trained model is general and robust in handling various tumor types and shadow noises in our acquired clinical ultrasound images. This shows its great potential to be applied to real clinical scenarios. That is, our initial intention of this work is achieved. Our ultimate goal is to develop a foundational segmentation model for medical image analysis. Currently, we are still collecting ultrasound images from different sites, including clinical applications and publicly available datasets, to further boost the generalizability and robustness of our large pre-trained model. However, the collection of ultrasound images from different sites demands long-term follow-up, hence it is difficult to expand the scope of medical images in a short term to boost generalization of the proposed model. In the meanwhile, we have conducted additional experiments on the BUSI dataset and compared with several other methods. Similarly, our model demonstrates state-of-the-art performance, which further reveals the outstanding robustness of our proposed method. In the future, we will pay attention to collection of new ultrasound datasets to further verify the generalization of our pre-trained model.
  3. We appreciated the reviewer comments. We will provide more implementation details of proposed model, supply sufficient information about the methodology, adjust the equations in a more proper way to read, analyze on the impact of different discriminators, and further explain the results of our experimental studies. Besides, for reproducing our results, a github link to access our current network will be provided in our final paper.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    While the reviewer’s comments for experiments on NC cases and ablation studies to investigate the effectiveness of multi-discriminator were not (and could not) be addressed by the author’s feedback, the study is still a suitable contribution to the MICCAI community especially with the recent advancement of foundational models.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper is interesting and the generating a large-scale dataset for Breast US is a relevant clinical task. However, the paper requires too many changes after the rebuttal that includes not only clarifications but additional experiments. Therefore, it is not ready for publication in its current state and needs to be revised for a future submission following the reviewer feedback.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The novelty of the proposed method is limited to modifying the decoder of U-Net architecture unless the motivation sounds and the advantage of using the method is interpreted. Even after reading the rebuttal, the role and the relations of different discriminators at multiple levels in improving the accuracy and generalizability of the segmentation model are still unclear. The method section is not well written, although the rebuttal tries to clarify it.



back to top