Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yanyu Xu, Menghan Zhou, Yangqin Feng, Xinxing Xu, Huazhu Fu, Rick Siow Mong Goh, Yong Liu

Abstract

Medical image segmentation is a critical key task for computer-assisted diagnosis and disease monitoring. However, collecting a large-scale medical dataset with well-annotation is time-consuming and requires domain knowledge. Reducing the number of annotations poses two challenges: obtaining sufficient supervision and generating high-quality pseudo labels. To address these, we propose a universal framework for annotation-efficient medical segmentation, which is capable of handling both scribble-supervised and point-supervised segmentation. Our approach includes an auxiliary reconstruction branch that provides more supervision and backward sufficient gradients for learning visual representations. Besides, a novel pseudo label generation branch utilizes the VQ bank to store texture-oriented and global features for generating pseudo labels. To boost the model training, we generate high-quality pseudo labels by mixing the segmentation prediction and pseudo labels from the VQ bank. The experiments on the ACDC MRI segmentation dataset demonstrate the effectiveness of our proposed method. We obtain a comparable performance (0.86 vs. 0.87 DSC score).

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43898-1_60

SharedIt: https://rdcu.be/dnwBU

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents a new weakly supervised method for medical image segmentation. They use points or scribbles for supervision. They propose touse the VQ bank to store texture-oriented and global features for generating pseudo labels (PLs). Besides, they mix the segmentation prediction with PLs to generate high-quality PLs. Experiments demonstrate the effectiveness of the proposed method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Overall, the paper is well organized and easy to follow.

    2. The paper addresses a problem in weakly supervised segmentation. The method for generating high-quality pseudo labels is sound.

    3. The experimental results show that a lot of annotation cost is reduced.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Some details are not well illustrated. Especially the method section, some claims are not well explained, some description of the method is not clear. Please see my detailed comments below.

    2. Authors should clearly demonstrate the novelty of the paper. The core method of the paper is based on VQ, and the loss functions are not new.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Overall, the method is sound and can be re-implemented. But more implementation details on the method should be provided.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. Novelty The authors need to clearly demonstrate the novelty of their paper.

    First, the core method of the paper is based on vector quantization, which is not a new technique. Therefore, the authors should emphasize the advantages of their method over the existing ones and show how their proposed method is different from the previous methods (i.e., which part is new).

    Second, similarly, the pCE loss function is not new, and the authors should explain how their modifications or additions to this technique have resulted in improved performance.

    1. Method 2.1 The authors need to provide a more detailed explanation of the motivation for adding the reconstruction task and how it can help improve visual representation. They should also explain how the reconstruction feature maps and segmentation feature maps are generated and what the relationship between them is.

    2.2 Fig. 1. (b) significant annotation reducing (99%). How is 99% measured? (c) how are the feature maps generated? What is the relationship between reconstruction feature map and segmentation feature maps? How should we interpret them?

    2.3 In the “memory” paragraph on page 5, the authors need to explain how the size of the bank is determined and whether it is large enough. Specifically, they need to explain why they chose n=512 and e_i ∈ R1×64.

    2.4 After the model is well trained, what does the memory bank look like? Can authors show some examples of the content in memory bank?

    1. Experiment 3.1 The authors need to provide statistical analysis to demonstrate the significance of the performance improvement over the strong baselines since some improvements are marginal.

    3.2 The difference in visual comparison is not obvious. The authors need to improve the visual comparison by highlighting the difference between their proposed method and the baseline methods more clearly.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, the paper’s organization is good, but there is room for improvement in the method section’s description. The proposed method is sound, and there is a performance improvement compared to the state-of-the-art (SOTA) methods.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    Authors address most of my concerns. I hope authors can prepare the revised manuscript carefully and address all the concerns of reviewers.



Review #2

  • Please describe the contribution of the paper

    The paper proposes a new weakly supervised method for medical image segmentation from scribbles and points supervision. The main hypothesis is that features learnt by segmentation and reconstruction tasks are similar. Hence, an auxiliary reconstruction task is added to the segmentation model to improve the segmentation performance. A vector-quantized memory bank is used to store learnt features and is used for generating pseudo labels, which are then used to supervise the segmentation branch.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The idea of using VQ memory bank in the reconstruction task for weakly supervised segmentation is novel and interesting.
    • The paper is well-motivated – given the need for accurate segmentation and the difficulty in obtaining pixel-wise annotations.
    • Ablation analysis and extensive experiments are performed to validate the proposed method. Ablation analysis demonstrates the importance of the proposed components.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The presentation and writing needs to be improved. Some sections are not clear and need to be rewritten. For instance, the description of the VQ memory bank. Notations are also some times inconsistent and confusing. Some examples are given in the detailed comments below.
    • Figure 1 is also not clear. What do the small blue, red, and green dots indicate? What does the figure tell the reader? I am not able to draw any conclusions from that figure. Not also convinced that the figure tells us that segmentation and reconstruction features are similar – a main hypothesis of the paper.
  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors provide the implementation details and hyperparameters used in the experiments. They are also willing to publicly share the code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • Please write the full form of the abbreviations the first time you use it. I could not find what VQ stands for. I am assuming it is vector quantized.
    • Figure 1: in addition to the comments above, the caption says there are examples of dense, scribble and point annotations in subfigure a. However, dense annotation is not in that subfigure, and I can not see the point annotation example. caption – b: it says the performance is 0.86 vs 0.87, I think it is 0.86 vs 0.898 (the full supervision score).
    • Section 2, first paragraph: “as illustrated in Figure 2” I think the authors mean Figure 1.
    • Segmentation branch paragraph: notation is not clear. S is the annotation set, the set of labeled pixels in S is w_s. Then the number of pixels in should be w_s not S.
    • L_{recon} equation: what is F?
    • Computed feature maps are used to update the memory bank, and it is mentioned that each vector in the memory bank \in R^{1\times64}, does that mean that the feature map F_{recon} is a feature map of the same size? This part is unclear and needs better explanation.
    • f_j is a spatial location, e_k (1x64) is a feature map previously stored in the memory bank. How do the authors compute the distance between those? Do they mean computing the distance between f_j and e_k_j, where e_k_j is the j spatial location in the feature map k? That part needs clarification. Mention the size of F_{recon}.
    • Pseudo Label Table Update Stage: “we need to assign [1, 0, 0]” but the figure says 0.7,0.1,0.2. Please clarify.
    • \alpha is reported to be a random scalar, What does random mean, uniformly sampled from [0,1]?
    • L_{pl} equation: What is PL? The authors probably mean y^* instead.
    • eqn 1: What is PLS? I think the authors mean PL.
    • Please mention what is the default initialization used for the model.
    • How were the scribble annotations simulated?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is not well-written and some important details are missing or unclear, so some parts of the method are not easily understood by the reader. The idea is interesting and the experiments are extensive. However, the paper needs to be rewritten and the presentation needs to be improved.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    4

  • [Post rebuttal] Please justify your decision

    I thank the authors for the rebuttal. I changed my score to weak reject in light of the explanation of the novelty in the paper. However, I still have major concerns on the presentation and can not fully follow the methods, hence why the weak rejection. If the paper is accepted, I recommend improving the presentation of the method, including the notation and equations. Some parts of the methods are still not super clear to me. For example, in L_{recon} eqn, do the authors use y_seg (as in the paper) or y_recon (as in the rebuttal)? Updates of VQ bank are also difficult to understand, at least for me in the current version. I think this part can be improved for better understanding of the reader. In addition to the other comments in the reviews regarding figures and tables.



Review #3

  • Please describe the contribution of the paper

    The paper presents a label-efficient (weakly supervised) framework handling both scribble- and point-based weak labels. Reconstruction branch is used as auxiliary task. A novel method for high-quality pseudo-label generation is proposed, which mixes the segmentation prediction with the stored features from the vector quantization memory bank. Good performance is demonstrated on ACDC dataset with very sparse annotations (points).

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper clearly lists the two challenges for label-efficient learning: more supervisions and better pseudo-labels, which provide strong motivation for the paper.

    • The proposed method can handle both scribble and point labels, which also shows good results even with much sparse point labels.

    • Utilize the reconstruction feature for VQ memory bank to generate pseudo label is a novel method in label-efficient medical segmentation task.

    • Randomness is added to multiple parts of the proposed method, which increases the robustness of the method, e.g. randomly generate points as labels, randomly select alpha in final pseudo label generation.

    • The experiments compare the proposed method with both leading WSL and SSL methods, and also provide mean and std, which make the model performance more convincible and promising.

    • Enough ablation studies show the effectiveness of each block and different amounts of labeled pixels.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Using reconstruction as auxiliary task in label-efficient learning is not a novel idea; moreover, as the author mentioned inspired by self-supervised learning, other latest self-supervised learning auxiliary tasks such as contrastive learning are not considered or compared in the paper, which might provide even more supervision.

    • Only one small dataset is used for evaluation, which is not enough to show the generalizability of the proposed method.

    • In Table 1, the best score of each column is not highlighted (make it bold or color), which makes the reader hard to follow the results.

    • The Dice scores of scribble setting have very limited improvements compared with other leading weakly supervised methods.

    • It’s not clear how the augmented reconstruction feature F’_recon is used in VQ memory bank. I’m assuming it’s used for VQ loss, but the annotation f’ for F’_recon and f in L_VQ are not consistent.

    • The annotations for pseudo label loss are not consistent, i.e. in Eq. 1, it’s annotated as L_PLS(PL, y1), however, in the definition, it’s written as L_pl(PL, y1).

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The dataset is public. The author does not provide code and model. The hyperparameters are provided in main paper for training. I think the paper is reproducible, but might be a bit hard due to the complex method.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • The paper uses reconstruction branch to update the VQ memory bank. Considering the pseudo-label from VQ bank is mainly based on feature clustering (find nearest vector), so it might be better to consider using contrastive learning as an auxiliary task, which could provide better latent feature similarity within each class.

    • It’s better to evaluate the framework on another dataset for organ/tumor segmentation, in order to show the generalizability of the method.

    • Please highlight the best scores in Table1.

    • Please make the annotations consistent through the paper.

    • The proposed method is totally possible to transfer to a 3D segmentation model. Since 3D model normally has a better and robust performance and convenient to use in practice, I would suggest the author to consider evaluating the results on both 2D and 3D settings.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Based on the method novelty and the comprehensive experiments, I would like to recommend the paper as accept. The marginal improvement on Dice score is the main drawback.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper proposes a new weakly supervised method for image segmentation from scribble and point supervision. The core idea is to employ an auxiliary reconstruction branch to improve the segmentation performance. The reviewers commented that the motivation and the idea of using a VQ memory bank are good. However, there are several unclear points that should be clarified in the rebuttal, including some missing/unclear details (R1, R2), the novelty of employing the auxiliary task and VQ (R1, R3)




Author Feedback

We thank the reviewers for the high-quality reviews and will carefully revise our submission, correct typos, add more details and discussions in the revision.

Q1 Novelty (R1, R3) The novelty is in task & training framework level. We propose a unified framework for scribble/point-supervised segmentation tasks with high performance. It results from our two novel training designs: pseudo labels from VQ bank in an auxiliary task:

Auxiliary reconstruction branch is used to provide informative supervision to learn visual representations. It is motivated by observation that similar patterns exist between reconstruction and segmentation feature distances for each class at global level. Result is improved from 0.25 to 0.54, by only adding reconstruction branch without VQ bank.

Pseudo Label Generation: We employ the Vector Quantization (VQ) memory bank to store global features to generate high-quality pseudo labels. While VQ is commonly used for feature extraction, token generation and memory banks, the usage for pseudo label generation sets it apart from previous methods. Ours performs higher than the one without VQ for point, 0.83 vs. 0.54. For the limited seg information, we combine segmentation predictions with pseudo labels from VQ bank to generate pseudo labels. It improves result from 0.834 to 0.847. The pCE loss is not our novel part and the key improvement lies in usage of pseudo labels from the VQ bank.

Q2 Motivation for reconstruction & Fig. 1 (R1, R2, R3) 1) The reconstruction & seg features are extracted from the last conv layers in their branches. The small blue, red, and green dots is annotated points in Fig. 1 (c). we will rectify this mistake in labeling of the feature maps, where it should be feature distance maps. Fig. 1 (c) shows there are similar patterns between recon and seg feature distance maps for each class at the global level. Black color indicates smaller distances. The feature distances in seg maps appear to be cleaner than those in the recon maps. It suggests that seg features possess task-specific information, while recon features can be seen as a broader set with seg information. 2) In ablation study, result is improved from 0.25 to 0.54, by adding recon branch without VQ bank. 3) We will correct mistakes in Fig. 1 caption & enlarge point annotations. 4) The annotation reduction (99%) is the number of annotated points divided by the whole number of pixels.

Q3 VQ memory bank. (R1, R2, R3) 1) L_{recon} =|x-y_recon|^2. f_j (1x64) is a feature from F_{recon} (32x64x256x256). e_k (1x64) is a feature vector in VQ bank. L2 is used to compute the distance between f_j and e_k. The augmented recon features are used for recon loss and feedback gradient to update VQ bank via L_{VQ}. \alpha is uniformly sampled from [0,1]. 2) [1,0,0] is probability in current step and [0.7,0.1,0.2] is stored probabilities after doing Exponentially Moving Average (EMA) and delay is 0.9 in our implementation. 3) The size & dimension of embedding of VQ bank are 512 & 64 the default setting in VQVAE. Model results (sizes of bank 64, 256, 512) are 0.865, 0.866, 0.866. We find 20-24 vectors are commonly used as clusters of 95% features and can set 64 as bank size to save memory. The pseudo label table is like [0.0, 0.12, 0.08, 0.8] for one vector.

Q4 Experimental details (R1, R2, R3) 1) We will use red boxes to highlight differences in Fig. 4 & the bold to highlight best scores in Table 1. 2) We used the default initialization of PyTorch(1.8.0). Scribble annotations are simulated by ITK-SNAP. 4) The mean & standard dev are 0.881 & 0.01 by running 3 times. Higher than 0.872 of WSL4MI.

Q5 Self-supervised learning way (R3) We use local pixel-wise contrastive learning[1] to replace reconstruction & keep the rest same. Result is 0.85 for point.

Q6 Extension (R3) We use the Hippocampus dataset in[1] for 3D task & owing to limited time, result for only scribble is 0.72. [1] Semi-supervised Contrastive Learning for Label-Efficient Medical




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    In the rebuttal, the authors well-addressed the majority of concerns raised by the reviewers, particularly pertaining to the novelty of the work and the clarity of method descriptions. R1 expressed satisfaction with the authors’ response, while R2 upgraded their rating from reject to weak reject. Although R2 still maintains some concerns regarding the presentation of the method, the overall merits of the paper outweigh its weaknesses. Consequently, based on careful consideration, I recommend accepting this paper.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper presents a novel approach for weakly supervised medical image segmentation using scribbles and points as supervision. Reviewers commend the paper’s well-organized structure, the robustness of the method in generating high-quality pseudo labels, and the demonstrated reduction in annotation cost through experiments. However, they raise concerns about the lack of clarity in certain sections, the insufficient demonstration of novelty, and the limited evaluation on a single dataset. The authors effectively addressed many of these concerns in the rebuttal, but further improvements are needed in notation, presentation, and writing to enhance readability. Considering the reviewers’ feedback and the authors’ response, I recommend accepting the paper.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This work received mixed scores in the first round of reviews, which did not change much after the discussion period (just R2 slightly increased the recommendation to WR). Given that the reviews were diverging, I read the reviews, rebuttal and the whole paper to have a clear and personal view of the submitted work. After considering all these elements, and despite this work may have some interest, I have a major concern related to the empirical validation. More concretely, this work attempts to present a novel state-of-the-art approach for segmentation in medical imaging with limited supervision (indeed, authors state clearly that the proposed method outperforms existing scribble-supervised approaches, and achieves better performance than several several supervised methods). Nevertheless, this statement is factually wrong and misleading. Please note that compared methods are not recent approaches (only one WSL from 2022, and one SSL from 2020, the rest were published before 2019), and there exist a huge body of literature that authors have disregarded in their work, as well as in the claimed contributions (see for example [a-c] for WSL, whereas for SSL there are more than 50 works only in MICCAI, TMI and MedIA since 2020). Thus, while I do not typically penalize a work based solely on the performance (at the end of the day it is ok if sota cannot be achieved), I feel that authors have ignored important efforts from the community towards learning algorithms with limited supervision. Thus, I recommend the rejection of this work, and strongly encourage the authors to consider adding relevant recent works in their comparisons (instead of old methods from 2006 to 2019) in order to position their work in a better way wrt existing literature.

    [a] Chen et al. C-CAM: Causal CAM for Weakly Supervised Semantic Segmentation on Medical Image. CVPR’22

    [b] Qian et al. Transformer based multiple instance learning for weakly supervised histopathology image segmentation. MICCAI’22

    [c] Luo et al. Scribble-supervised medical image segmentation via dual-branch network and dynamically mixed pseudo labels supervision. MICCAI’22



back to top