Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yuming Zhong, Yi Wang

Abstract

Breast dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) plays an important role in the screening and prognosis assessment of high-risk breast cancer. The segmentation of cancerous regions is essential useful for the subsequent analysis of breast MRI. To alleviate the annotation effort to train the segmentation networks, we propose a weakly-supervised strategy using extreme points as annotations for breast cancer segmentation. Without using any bells and whistles, our strategy focuses on fully exploiting the learning capability of the routine training procedure, i.e., the train - fine-tune - retrain process. The network first utilizes the pseudo-masks generated using the extreme points to train itself, by minimizing a contrastive loss, which encourages the network to learn more representative features for cancerous voxels. Then the trained network fine-tunes itself by using a similarity-aware propagation learning (SimPLe) strategy, which leverages feature similarity between unlabeled and positive voxels to propagate labels. Finally the network retrains itself by employing the pseudo-masks generated using previous fine-tuned network. The proposed method is evaluated on our collected DCE-MRI dataset containing 206 patients with biopsy-proven breast cancers. Experimental results demonstrate our method effectively fine-tunes the network by using the SimPLe strategy, and achieves a mean Dice value of 81%. Our code is publicly available at https://github.com/Abner228/SmileCode.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43901-8_54

SharedIt: https://rdcu.be/dnwD2

Link to the code repository

https://github.com/Abner228/SmileCode

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors present a tumour segmentation scheme for 3D contrast enhanced MRI using a similarity based fine tuning scheme from initially estimated pseudo-labels. The fine tuning refines the segmentation labels in a three-stage process that results in performance equivalent to a fully super-vised model. The method is applied to an in-house DCE-MRI dataset with a comparison to a conditional random field-based method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is interesting, well written and motivated. It solves a difficult problem of tumour segmentation, which can be highly patient specific in size and location. The metrics presented are appropriate and the segmentation performance is good. The technical novelty of the method is not revolutionary and simple as the name implies, which is a good thing as the results are good.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The authors mention that “we collected 206 DCE-MRI scans” and then mention that “We randomly divided the dataset into 21 scans for training and the remaining scans for testing”, but then Table 1 shows one method as “Fully Supervision” (which should be Fully Supervised by the way). I presume either that using only 21 for training is for weak supervision, but is the results presented in Table 1 only have 21 3D images for training? If so, then this is not likely to have very good performance and is that why the weakly supervised method performance equivalently? Or is this a typo in the sentence and 21 images were used for testing?
    2. The authors have only used an in-house dataset, which is not necessary a major issue as DCE-MRI data is very hard to obtain and I’m not familiar with an easily obtainable open alternative. But I do not think a comparison to a single other method (plus the ablation study that comprises the remaining rows) is sufficient for MICCAI. When using an in-house dataset, it is critical to have sufficient evidence of other methods to show what other baselines are, demonstrate the difficulty level of the dataset and that your method is re-producible by comparison at a later stage.
    3. The segmentation examples shown are all with singular, ovular, obvious tumours. How does the method function under multiple cores and more speculated tumours?
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    In house dataset makes it challenging, but the authors will release their code upon acceptance, which will help greatly.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Adding additional methods for comparison to provide greater quantitative results would be more convincing.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Additional quantitative results would have been more convincing. Willing to adjust score depending on author responses.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    Since the authors have resolved some of the issues such as providing more info on using 21 patients, some new comparisons, adding visualizations of challenging cases and releasing their code, I have increased my rating conditional to these elements being added to the revised version if rules allow. I hope that Table 1 will also be better presented in the revised version.



Review #2

  • Please describe the contribution of the paper

    The paper proposes a novel weakly-supervised framework for breast cancer segmentation in DCE-MRI. The method uses only six extreme points for training instead of full annotation. An improved random walker algorithm is used to generate initial pseudo-masks from the points. The method trains and fine-tunes the segmentation network using these pseudo-masks, which are updated using a similarity propagation strategy. The proposed method is evaluated on a dataset of 206 DCE-MRI scans and shows a clear improvement over the baselines while achieving similar performance to fully-supervised training results.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is well-presented, and the figures provide a clear understanding of the workflow and data used.
    • The paper addresses the challenge of annotating 3D medical images for segmentation, which is a critical issue in clinical settings, using breast tumor segmentation as a showcase to show how the novel weakly-supervised strategy can reduce annotation effort.
    • The introduction of similarity-aware propagation and integration in the fine-tuning loss appears to be innovative and results in clear improvements.
    • The evaluation dataset is relatively large, and the quantitative results clearly demonstrate the improvement achieved by the proposed method (> 0.1 of Dice).
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • It would be beneficial to explore the generalizability of the proposed framework to other segmentation tasks and modalities and to use larger publicly available datasets for evaluation.
    • The paper presents a well-explained method, but there is a lack of discussion on why the proposed method achieved a clear improvement. The paper should provide a deeper insight into why the proposed method performs better than the baseline methods, and also include an analysis of hyper-parameters, such as alpha, lambda and omega.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The details of experimental setup are provided. The description of data and algorithm are also clearly explained.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • In the second paragraph of the Introduction, the authors cite related works on breast tumor segmentation and report Dice scores. However, the experimental setups and datasets used in those works are not mentioned, making the reported scores difficult to interpret.
    • Figure 2 could be improved by adding a zoomed image of the segmentation to provide more clarity.
    • The method for defining manually the extreme points is not clearly explained.
    • While the numbers in Table 1 indicate similar performance to fully supervised training, statistical testing could be performed to confirm the significance of the results.
    • The paper introduces uncommon hyper-parameters such as alpha, lambda, and omega. Additional ablation studies would be beneficial to provide a more comprehensive analysis of their effect on the results.
    • The authors should explain why they chose to use 21 out of 206 scans for training, and whether the difference between weakly- and fully-supervised training would be larger with more training samples.
    • In addition to using Dice and Jaccard metrics, the authors could also use surface distance to better quantify the difference between the ground truth and prediction.
    • The authors could provide an estimate of how much the annotation effort is reduced using their proposed framework compared to traditional fully supervised training.
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper proposes an interesting “simple” framework for weakly-supervised segmentation. The annotation effort can be actually reduced and have also a comparative performance. Overall, this is a well written and clearly presented paper.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The context is clear, and weakly supervised learning is indeed an essential direction for medical images due to the time-consuming annotation process.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The background is clear, and weakly supervised learning is indeed an important direction for medical images with time-cosuming annotation process.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. My major concern is the feasibility of the proposed methods for challenging cases. Although the visualization results shown in the paper demonstrate that the tumors have a much higher intensity than the background with a clear boundary, in more malignant cases of breast cancer, the tumor shape and boundary can be more complex and unclear. Therefore, it is questionable whether the proposed method can generate good initial pseudo-masks with just simple points and the random walker algorithm. Additionally, in DCE-MRI, the vessels and some normal fibroglandular tissue (called background parenchymal enhancement) are also enhanced, which could potentially lead to an over-segmentation problem. This paper did not address how to solve this problem.
    2. In the Introduction, the authors provide an overview of existing works. Hence, the authors should be aware that existing breast tumor segmentation methods are performed on either (1) the combination of pre-contrast and post-contrast images or (2) the combination of post-contrast images and subtraction images. The combination inputs can provide better tissue contrast to aid in the segmentation task. Therefore, it is unclear why the authors only used post-contrast images as inputs, which lack tissue contrast information.
    3. In Stage 2, the authors used the parameters of Stage 1 as the initialization. I wonder why the same strategy was not used for initialization in Stage 3.
    4. The authors used the fine-tuned pseudo-mask as ground truth to train the final segmentation network in Stage 3, but the segmentation results were better than the fine-tuned pseudo-mask. Does this make sense? If so, it means that traditional annotations in supervised learning do not need to be precise because the networks can learn a better one.
    5. This paper provided an ablation study in Table 1, which showed a significant improvement, and the accuracy of the proposed weakly-supervised methods was close to that of supervised methods. Does this mean that the generated fine-tuned pseudo-mask is similar to the ground truth? If so, why do we need Stage 3? Additionally, in Figure 3, the four masks appear to be nearly identical. Can the authors provide some visualization results of challenging cases?
    6. The cases shown in Figure 4 are easy to segment. The authors should provide visualization results of challenging cases to better demonstrate the effectiveness of their proposed method.
    7. The paper lacks some comparison experiments. As the authors mentioned in the introduction, there are many weakly-supervised segmentation methods, some of which may not have been developed specifically for breast cancer. However, the authors should also compare the segmentation performance of the proposed method with these methods to better illustrate its superiority.
    8. Minor problem: In Figure 1, for the second case, the box’s location in sub-figure (b) does not match that in sub-figure (c).
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    While the implementation details are clear, the authors have stated that the data is private, and they plan to release the code at a later time.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    To demonstrate the effectiveness of the proposed methods, could you please provide some visualization results of challenging cases?

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    See P6.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Strength: 1) Problem setup and motivation of the proposed method is strong, the paper is well written, and well organized. 2) The introduction of similarity-aware propagation and integration in the fine-tuning loss seems to be novel.

    Weakness: 1) The usage of the dataset is not clear, especially Table is very ambiguous. 2) Please also clarifies the concerns raised by the reviewer 2. 3) Visual segmentation results are also not convincing because these tumors are all only singular and obvious tumors.




Author Feedback

We cordially thank you for your time and efforts on the review of the submission. We have carefully studied your suggestions and addressed your main concerns (e.g., more comparison methods, more qualitative and quantitative results, usage of dataset, etc.).

1) more comparison experiments (R1, R3) As mentioned in the manuscript, most weakly-supervised methods could not be an appropriate strategy for the segmentation of lesions and this has been proved in our previous preliminary studies. We have conducted new comparison experiments with three general weakly-supervised strategies, including entropy minimization [r1], mean teacher [r2], bounding box [r3]. Our method consistently outcompetes [r1-r3] with respect to all evaluation metrics.

Methods, Dice [%], Jaccard [%], ASD [mm], 95HD [mm] pce+crf, 64.97, 53.83, 1.01, 3.90 pce+ctr, 69.39, 57.36, 0.95, 3.60 Ent [r1], 71.94, 58.74, 0.88, 3.16 MT [r2], 65.92, 53.82, 1.02, 3.74 BBox [r3], 77.02, 65.08, 0.89, 2.54 pce+crf+SimPLe, 79.71, 68.99, 0.74, 2.48 pce+ctr+SimPLe, 81.20, 70.01, 0.69, 2.40 Fully Supervision, 81.52, 72.10, 0.68, 2.40

[r1] NeurIPS2004, Semi-supervised learning by entropy minimization [r2] NeurIPS2017, Mean teachers are better role models.. [r3] MIDL2020, Bounding boxes for weakly supervised segmentation..

2) more quantitative results (R2) Surface distance: we have provided surface-based metrics, including average surface distance (ASD) and Hausdorff distance (95HD), see above results. Statistical testing: we have conducted Wilcoxon tests, which indicate although fully supervision outperforms our method statistically, our method is significantly better than other comparison methods on all evaluation metrics. Annotation effort: the average annotation time for extreme points and full masks are 31s and 95s per scan, respectively.

3) more qualitative results (Meta, R1, R3) We have prepared visualization results of challenging cases and will include them in the final version.

4) the usage of the dataset (Meta, R1, R2) In our preliminary experiments, we have tried different amount of training data to investigate the segmentation performance of the fully-supervised network. The results showed that when using 21, 42, 63 scans for training, the Dice results changed very little, within 0.3%. Therefore, in order to use more testing data to evaluate the method, we chose to use 21 (10%) out of 206 scans for training. In addition, the fully-supervised network using 21 scans achieved 81% Dice, which has similar performance to most related works (~80% Dice).

5) explore the generalizability (R2) The code will be released. Our method is simple and easy to be applied thus could be tested on other tasks.

6) discuss why improved (R2) We leverage voxel-wise label propagation. We measure how close two samples by cosine similarity and propagate labels for unlabeled voxels. It pushes the network to learn better pseudo-masks.

7) related works (R2) The experimental setups and datasets of related works will be added.

8) background parenchymal enhancement (BPE) (R3) As for the task of tumor segmentation, the false-negative and false-positive (FP) issue remains challenging. By using SimPLe strategy, our method achieves satisfactory tumor identification results, with a sensitivity of 99.46%, and 0.94 FP per scan.

9) only use post-contrast (R3) Although many studies use post- and pre-contrast, it might bring the issue of image mis-alignment thus introducing extra noises. There are also many studies only using post-contrast to conduct tumor segmentation and achieving satisfactory results. This study mainly focuses on the strategy to enhance weakly-supervised segmentation and we may try use both later.

10) why stage3 (R3) At stage2, we propagate labels to unlabeled voxels by leveraging feature similarity. Most unlabeled voxels can be labeled properly. But some BPE is misclassified as foreground. Thus we generate pseudo-mask only for the unlabeled voxels inside bounding box then retrain.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Most concerns are addressed by the authors. This paper is among the borderline. Although the paper could be improved in the future, it can’t be accepted in the current form as many important validation experiments are missing.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper presents a novel weakly-supervised segmentation method that utilizes point annotations, specifically for breast cancer segmentation in DCE MRI. The fundamental concept involves refining pseudo labels generated from the model trained using sparse point labels based on feature similarity, eliminating the need for fully annotated labels. The reviewers evaluated this paper mostly positively, particularly highlighting its motivation and innovative idea. However, R1 raised concerns regarding insufficient validation, while R3 expressed skepticism about the feasibility of handling challenging cases. In the rebuttal, the authors satisfactorily addressed these issues, resulting in R1 updating the rating from weak reject to weak accept. Considering the assessments of all reviewers, it is evident that the paper holds substantial merits, and therefore, I recommend accepting it.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper presents a weakly supervised segmentation method for DCE-MRI of breast-cancer patients. The reviewers raised some concerning issues. The author addressed most of these concerns, however, to this M.R. opinion their answers to some of the questions raised were not entirely convincing (e.g., the comment related to the pre-post contrast usage).



back to top