Authors

An Wang, Mengya Xu, Yang Zhang, Mobarakol Islam, Hongliang Ren

Abstract

Fully-supervised polyp segmentation has accomplished significant triumphs over the years in advancing the early diagnosis of colorectal cancer. However, label-efficient solutions from weak supervision like scribbles are rarely explored yet primarily meaningful and demanding in medical practice due to the expensiveness and scarcity of densely-annotated polyp data. Besides, various deployment issues, including data shifts and corruption, put forward further requests for model generalization and robustness. To address these concerns, we design a framework of Spatial-Spectral Dual-branch Mutual Teaching and Entropy-guided Pseudo Label Ensemble Learning (S2ME). Concretely, for the first time in weakly-supervised medical image segmentation, we promote the dual-branch co-teaching framework by leveraging the intrinsic complementarity of features extracted from the spatial and spectral domains and encouraging cross-space consistency through collaborative optimization. Furthermore, to produce reliable mixed pseudo labels, which enhance the effectiveness of ensemble learning, we introduce a novel adaptive pixel-wise fusion technique based on the entropy guidance from the spatial and spectral branches. Our strategy efficiently mitigates the deleterious effects of uncertainty and noise present in pseudo labels and surpasses previous alternatives in terms of efficacy. Ultimately, we formulate a holistic optimization objective to learn from the hybrid supervision of scribbles and pseudo labels. Extensive experiments and evaluation on four public datasets demonstrate the superiority of our method regarding in-distribution accuracy, out-of-distribution generalization, and robustness, highlighting its promising clinical significance. Our code is available at https://github.com/lofrienger/S2ME.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43907-0_4

SharedIt: https://rdcu.be/dnwb1

Link to the code repository

https://github.com/lofrienger/S2ME

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

The paper presents a novel method for the weakly-supervised segmentation of polyps in video images. The main innovation of this method lies in the combination of spatial and spectral information in two network branches that are cross-trained using a mutual teaching strategy. A pixel-wise fusion strategy based on entropy uncertainty is also proposed to generate pseudo-labels from the predictions of the different branches. Experiments on four polyp video data sets show the proposed method to outperform recent approaches for weakly-supervised segmentation. The usefulness of the method’s components is also demonstrated in various ablation experiments.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- While previous approaches like Y-Net have also implemented a dual branch architecture to combine spatial and spectral information, the idea of cross-training the two branches using a mutual teaching strategy seems novel (to my knowledge). Moreover, although similar entropy-based fusion techniques have been used in previous works (e.g., [1]), the use of such technique in this particular context is new.
- Experiments are detailed and clearly show the advantage of the proposed method compared to weakly-supervised approaches. Ablation studies are also well designed.
- The paper is well written. Despite the engineering complexity of the method, its presentation is easy to follow.
[1] Wang, Pei, Hui Fu, and Ke Zhang. “A pixel-level entropy-weighted image fusion algorithm based on bidimensional ensemble empirical mode decomposition.” International Journal of Distributed Sensor Networks 14, no. 12 (2018): 1550147718818755.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- No major weakness but some points need to be clarified, pertaining to the reported performance of Fully-CE, choice of comparison approaches and runtime/memory complexity. See comments below.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The paper is detailed enough to reproduced the method and experiments. Authors have also included a link to their code repo (the reviewer has not checked the code).
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
- Eq (3): should it be c = 1…C or c = 0..C-1 ?
- Eq (7) and (8): Why do you give an equal weight to the two terms in the sum? Is this optimal?
- How did you select the weakly-supervised approaches for comparison experiments? Have they been used for polyp video segmentation before? If previous studies have investigated this problem, how do your results compare to the ones in these studies?
- The reported values for Fully-CE in Table 2 seem low. For instance, the CVC-ClinicDB paper reports an IoU of .82 and the PolypGen paper a Dice between .7 and .8. How do you explain this discrepancy?
- How does the runtime and memory complexity of your method compare to that of other tested approaches?
- In Table 3, did you use the same mutual teaching and fusion strategies for the ME settings (first two rows of the table)?
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The proposed method has novel elements and shows a good performance. Overall the positive points outweigh the negative ones.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #2

Please describe the contribution of the paper

This paper presents a new method for scribble-supervised polyp segmentation by three main contributions, i.e., spatial-spectral cross-domain mutual teaching, entropy-guided pseudo-label ensemble learning, and hybrid loss supervision from scribbles and pseudo labels.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This paper is well-written and easy to follow, the mathematics behind the design is elegant and concisely explained. Source code is available to encourage producibility. Sufficient ablation studies are performed.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The interpretability could be further emphasized. For example, in Table 2, the proposed method outperforms even the fully-supervised upper bound. Can the authors provide some interpretation of this result? Again, in Fig. 2., it seems that the scribbles are not shown in the figure.

In Eq. 9, two weighting coefficients, lambda_mt and lambda_el are introduced. While there is an ablation study on the loss components in Tab. 5, it could be interesting to investigate quantitatively how different combination of lambda values affects the final performance.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

I have checked that the anonymous source code can be viewed.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

From the paper, it seems unclear how scribbles were generated for each dataset. It is better to provide more detail on the scribbles as the quality of the weakly-supervised segmentation is heavily dependent on the scribbles.

The segmentation results of the baseline methods, EntMin, GCRF, USTM, CPS, and DMPLS, are reproduced on the datasets with the UNet backbone. However, training details and hyper-parameters are not presented.

It would be interesting to try more sophisticated backbones beyond UNet, such as ViT.

There are some typos or language issues. For example, on page 5, “labels” should be “Labels”.
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

6
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper tackles an interesting problem with a novel approach. The method is clearly presented.
Reviewer confidence

Confident but not absolutely certain
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Review #3

Please describe the contribution of the paper
1. This work propose a spatial-spectral dual-branch structure to explore the complementary relations for scribble supervised weakly-supervised medical image analysis.
2. A pixel-level entropy-guided fusion strategy to generate mixed pseudo labels.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. This work is clearly written and well organized.
2. The spatial-spectral dual-branch structure is a new design. The attempt to explore the complementary relations of the spatial-spectral dual branch is interesting.
3. The experiments are well conducted.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. This work is similar with the rationale of previous scribble-supervised segmentation work DMPL4[15]. They both employ a dual branch setting, and fuse the pseudo labels to compute the regularization loss. Given that this pattern has been widely adopted for semi/weak/scribble supervised learning, the reviewer cannot identify much novelty from it.
2. The entropy guided fusion strategy is another common strategy. And the methodology is a combination of these techniques. Even though I agree spetral branch of proposed method is different with previous works, the contribution might be marginal.
3. The benefit to of spetral domain branch lacks in-depth discussion. Why the introduction of spetral branch helps to improve the generalization ability of proposed network?
4. It might be meaningful to discuss the advantage of dual-branch based methods.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The reviewer released the code to help reproduce the results.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
1. As the main contribution of this work, the spetral branch lacks in-depth discussion of the benefits. Why it helps to improve the generalization ability of model? What is the complementary relationship between the spatial and spetral branch?
2. Other state-of-the-art scribble supervised segmentation works should also be discussed [1,2,3].
3. The scribble cases should be displayed for illustration.
[1] Valvano G, Leo A, Tsaftaris S A. Learning to segment from scribbles using multi-scale adversarial attention gates[J]. IEEE Transactions on Medical Imaging, 2021, 40(8): 1990-2001. [2] Zhang K, Zhuang X. Cyclemix: A holistic strategy for medical image segmentation from scribble supervision[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 11656-11665. [3] Zhang K, Zhuang X. Shapepu: A new pu learning framework regularized by global consistency for scribble supervised cardiac segmentation[C]//Medical Image Computing and Computer Assisted Intervention–MICCAI 2022
Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

5
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The novelty of the methodology setting is limited. It might be interesting if the authors could provide in-depth discussion of the benefits of spatial branch.
Reviewer confidence

Very confident
[Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

N/A
[Post rebuttal] Please justify your decision

N/A

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper presents a novel method for weakly-supervised segmentation of polyps in video images. The main novelty is in integrating spatial and spectral information in two network branches, which are cross-trained using a mutual teaching strategy. Solid experiments on four polyp data sets show competitive performances. There is a consensus among the reviews as to the novelty of the method and the quality of experimental evaluations (with good ablation studies). I concur with this. The paper is well-written and the code is available for reproducibility.

Author Feedback

We recognize all the reviewer’s expertise in this field and are grateful for the thoroughness with which they evaluated our work. The overall positive feedbacks about the novelty and experimental settings of our proposed method are inspiring, and we are committed to addressing any remaining concerns to ensure the highest quality of our manuscript. Below are the major points we would like to clarify.

R1:

We have updated Eq (3) where c=0…C-1;

For the sake of simplicity and reducing the hyperparameter tuning effort, we empirically set the equal weight to cross entropy loss and dice loss in Eq (7) and Eq (8). But we acknowledge that future studies could investigate the effectiveness of unequal weighting in different contexts.

We conducted a comprehensive literature review to identify recent state-of-the-art methods for comparison. The selected methods have not been specifically designed for scribble-supervised polyp segmentation which is underexplored. However, they have shown competitive performance for other similar medical image segmentation tasks in previous studies, making them well-suited for our experimental setup.

Table 2 displays the zero-shot generalization performance of various methods, wherein the models are trained on the SUN-SEG dataset and subsequently tested on three other unseen datasets with domain shifts. As such, it is reasonable to expect that the reported results may exhibit a significant discrepancy in comparison to those obtained through in-distribution evaluation.

Similar to other methods that employ dual-branch architecture, our approach necessitated additional memory and runtime during training, compared to single-branch methods. However, during inference, the memory and time consumption is minimized since only the spatial branch is utilized.

In Table 3, for a fair comparison, the same mutual teaching and fusion strategies are utilized and the only difference is the models of the dual branches.

R2:

As demonstrated in Table 2, our proposed method outperforms Fully-CE in terms of generalization performance on two unseen domains. We attribute this to the advantages of our approach, which include comprehensive feature extraction and fusion of both the spatial and spectral space. This allows the model to recognize a broader range of pattern representations, thereby improving its generalization ability. Besides, scribbles offer a more robust form of supervision, as opposed to methods that may be vulnerable to disturbances such as noise and outliers. This also potentially boosts the model’s generalization and robustness.

We express our gratitude to Reviewer #2 for providing us with insightful suggestions on how we can extend and enhance our work. We will definitely consider them for exploration in future studies.

R3:

We thank Reviewer #3 for recognizing the value of our proposed method in incorporating the spectral branch. Learning in the frequency domain has become a popular research area in recent years, with researchers and practitioners exploring its applications across various fields. Our experimental results have shown that incorporating the spectral branch improves the generalization ability of the model by capturing complementary information to that captured by the spatial branch. The spectral branch is designed to extract feature maps that capture the spectral characteristics of the polyp, while the spatial branch focuses on the spatial patterns of the polyp. By combining the features extracted from both branches, our method is able to leverage the advantages of both spectral and spatial information for more accurate polyp segmentation. To the best of our knowledge, we are the first to design the spatial-spectral cross-space architecture in weakly-supervised medical image segmentation.

The suggested relevant scribbled-supervised segmentation works will be discussed in the final manuscript.

back to top

S2ME: Spatial-Spectral Mutual Teaching and Ensemble Learning for Scribble-supervised Polyp Segmentation