Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Qin Zhou, Peng Liu, Guoyan Zheng

Abstract

Partially Supervised Multi-Organ Segmentation (PSMOS) has attracted increasing attention. However, facing with challenges from lacking sufficiently labeled data and cross-site data discrepancy, PSMOS remains largely an unsolved problem. In this paper, to fully take advantage of the unlabeled data, we propose to incorporate voxel-to-organ affinity in embedding space into a consistency learning framework, ensuring consistency in both label space and latent feature space. Furthermore, to mitigate the cross-site data discrepancy, we propose to propagate the organ-specific feature centers and inter-organ affinity relationships across different sites, calibrating the multi-site feature distribution from a statistical perspective. Extensive experiments manifest that our method generates favorable results compared with other state-of-the-art methods, especially on hard organs with relatively smaller sizes.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43895-0_63

SharedIt: https://rdcu.be/dnwzv

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes to learn a segmentation model from one fullly labeled dataset and multiple partially labeled dataset. They come up with to utilize partial labels using an affinity-aware consistency loss to encourage consistent predictions from two models. Another cross-site loss is also used to mitigate the domain gap of all datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The motivation of using consistency loss and mitigating domain gap is reasonable. The overall method is well designed.

    2. They provide ablation study to show the effectiveness of the two losses. They also provide evidence to show CSFA can mitigate gaps of all the five datasets.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The model is built based on teacher-student framework. The authors propose to use affinity-aware consistency loss to train models with partially labeled data. It would be better if they can show the experiments that use regular consistency loss (e.g. pixel-wise mse loss).

    2. In equation (2), f_i represents the i-th pixel in the feature maps. Does it mean the features from the last layer are used in the model? How do we know the labels y_i for each pixel in the feature map especially when feature maps are usually in a low resolution?

    3. I wonder whether the authors use the same datasets as previous papers. If not, it should be specified in the implementation details. The authors should also specify that all baselines are reproduced on these data.

    4. In Table3, some models use nnUNet as backbone and their performance are close to the proposed method. It would be more convincing if the authors implemented their methods on nnUNet as well.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The method is not easy to implement simply based on the paper. The authors claim they’ll release codes in the future.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Same as weakness.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper proposes a reasonable method to address the problem in partial label learning. They validate each component and provide sufficient evidence to support their motivation.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper

    The paper presents a deep network based method for multiple organ segmentation under a partially supervised setting. In particular, it propose affinity-aware consistency leanring (ACL) and cross-site feature alignment (CSFA) modules to overcome the shortcomings of existing methods: shortage in voxel-level labels and cross-site appearance variations. The ACL module uses a prototype to promote voxel-to-organ affinity and the CSFA module calibrates the feature difference across different sites. The effectiveness of these to modules are evaluated based experiments on 5 datasets of organ segmentation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The problem of multiple organ segmentation under a partially supervised setting is of clinical value.

    The writing is clearly in general and easy to understand.

    Overall, the experimental results show some improvemnts and the improvments on small organs are more pronouced.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There is some not less significant novelty in the proposed ACL and MSFC modules and their effect on the final segmentation is marginal, bringing an overall Dice improvement of only 0.6.

    The five abdominal CT datasets are not fully specified.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    It is difficult to fully reproduce the work as its own dataset split is used.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    Please specify the datasets.

    It is not clear why the proposed method works better for small organs. Please explain and discuss.

    It lacks statistical significance.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, the paper investigates a clinically valuable problem of multiple organ segmentation under a partially supervised setting. The proposed method is has some novelty but not significant and it brings marginal performance improvements. There is a lack of some experimental details and discussion.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The paper proposed a new partially-supervised organ segmentation algorithm based on voxel-to-organ affinity and cross-site affinity. The paper also shows promising results, where the proposed method surpassed other compared methods under a fair setting.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    A novel partially-supervised learning-based organ segmentation method was proposed. The results were promising. The structure of the paper was clearly formed.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    My major concerns are two folds: 1. Some details are missing; 2. There are too many symbols, and some usage of these symbols could be confusing. Please see following comments for more details.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The idea of the paper is clear and not hard to follow. The authors also claim clearly in the reproducability statement that the code will be released. Hence, the reproducability of this paper should be good.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    1. In Fig.1, it is hard to understand the interactions between the feature distributions and the interaction between the affinity matrix for the CSFA part.
    2. The symbol f is used for fully-labeled data, the feature extractor, and sometimes features. Considering the many symbols used in the paper, some mixed usage is fine with me. But it confuses me the symbols of $Z_{c}^{f}$ and $Z_{c}^{l}$ in equations (6) and (7): what are the f and l here?
    3. How are the prototypes initialized (e.g., random initialization)? For the organs with masks, are the prototypes computed based on the ground truth or predictions?
    4. Why is affinity loss chosen here instead of other consistency-based regularizations, e.g., pixel-wise consistency?
    5. The terms ‘pixel’ and ‘voxel’ are used interchangeably in the paper, which could be a bit confusing. Some terms, like ‘instance-to-ptototype’, ‘voxel-to-organ’, and ‘pixel-to-prototype’, also missed the proper explanations.
    6. Why equation (9) is called the ‘compactness’ loss?
    7. For experiments, are the results all averaged over the five testing sets?
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper overall proposed a novel method and showed promising results. However, more details and clarity are needed to improve the current version.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper introduces a partially supervised approach for multiple organ segmentation, utilizing affinity-aware consistency learning (ACL) and cross-site feature alignment (CSFA) modules. The ACL module promotes voxel-to-organ affinity, while the CSFA module addresses cross-site appearance variations. The paper demonstrates improvements over existing methods and provides evidence of the effectiveness of the proposed modules. However, there are some weaknesses, including missing details, symbol confusion, and limited statistical significance. The rebuttal should focus on clarifying the usage of symbols throughout the paper and providing clear explanations of their meanings. It should also specify the datasets used, provide more details about the figures and experimental setup, and include a comparison with nn-UNet for better evaluation and comparison. Additionally, a more comprehensive discussion is needed to explain the better performance on small organs and address any concerns regarding the statistical significance of the results. Furthermore, the rebuttal should clarify the initialization of prototypes and whether they are based on ground truth or predictions. It should also provide justifications for the selection of losses, particularly the affinity loss over other consistency-based regularizations.




Author Feedback

We thank meta-reviewer (MR) and all reviewers for their comments.

MR,R1,R3: affinity consistency (AC) vs pixel-wise consistency (PWC) PWC (e.g. mse loss) is more likely to be dominated by error predictions. Through pixel-to-prototype (P2P) matching, AC can transform P2P matching into normalized scores, stabilizing the training process. Besides, AC provides cross-class context information rather than focusing only on the pixel-wise information. We report the results of pixel-wise mse (PWM) by substituting affinity consistency with PWM in our method, the average Dice over organs is 56.0%, leading to severe performance drop compared to ours.

MR,R1: f_1 in Eq.2 f_1 in Eq.2 are extracted from the last layer before the segmentation head, and they have the same size as the input image.

MR,R1,R2: details about the datasets Datasets are described in the caption of Table 2. D0 - D4 respectively refer to the MALBCVWC, Decathlon-Spleen, KiTS, Decathlon-Liver and Decathlon-Pancreas datasets. Throughout our experiments, all methods are evaluated on the same dataset splits for a fair comparison.

MR,R1,R2: Results using nnUNet We conduct experiments using nnUNet, and the average Dice over organs is 92.5%, surpassing the second best (PaNN using nnUNet) by +1.2%.

MR,R2: better performance on small organs In segmentation, compared to large organs, small organs contribute less to the loss term due to their relative small sizes. Consequently, the training process is likely to be dominated by large organs. The situation is even worse under the partially labeled setting, with many small organs left unlabeled. Through the ACL and CSFA modules, our method can effectively take advantage of the unlabeled data of small-sized organs, thus boosting the performance on small organs.

MR,R2: statistical significance We conduct paired t-test to compare the difference between ours and SOTA. Obtained p-values are respectively 2e-8 (PIPO), 2e-5(DoDNet), 2e-4(Marginal Loss), and 0.037 (PaNN). Obtained p-values are all smaller than 0.05, indicating statistically significant differences.

MR,R3: clarifying CSFA module in Fig. 1 In Fig.1, the left side of the CSFA module refers to the organ-specific distribution alignment, where hollow shapes refer to the features of unlabeled organs in the PLDs, while solid ones refer to labeled organs. In L_a^{l,n} (Eq.9), only labeled organs (solid ones) are utilized for calculating loss. The right side refers to the affinity-structure aware loss, where each term in the affinity matrix denotes the inter-organ cosine similarity (Eq.10 and Eq.11).

MR,R3: clarification on the symbol f and l in Eq.6 and Eq.7 The normalization terms Z_{c}^{f} and Z_{c}^{l} refer to the number of reliable pixels (RP) with softmax-normalized predictions bigger than a predefined threshold. As the number of RP may differ, we use Z_{c}^{f} and Z_{c}^{l} to distinguish between them (f for the feature space, and l for the label space).

MR,R3: details on prototypes Prototypes are initialized as all zeros, and updated with moving average after each iteration. For labeled organs (both in the FLD and the PLDs), the prototypes are updated based on the ground truth (Eq.2), and for unlabeled ones, they are calculated based on predictions (Eq.3).

MR,R3: compactness loss (Eq.9) The objective of Eq.(9) is to pursue a unified feature space where pixels of the same class stay closer, whether they come from the FLD or the PLDs. Thus the features of the same class will form a compact cluster in the feature space (FS). Therefore the ‘compactness’ loss here refers to the intra-class compactness in the FS.

MR,R3: details on the calculation of results As the test sets are partially labeled (e.g., only one organ has ground truth annotations), and different images differ in the labeled organs. For a straightforward comparison, on each organ, we average the results from images containing that organ over the five testing sets, and report the organ-specific performance.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The reviewers point out issues including missing details about figures and experimental setup, symbol confusion, and limited statistical significance. The rebuttal successfully refute these concerns by providing the p-values, clarifying mathematical symbols and experimental details such as the initialization of prototypes, discussing the effects on small organs and including a comparison using nn-UNet. Based on the reviews and the rebuttal, the meta-reviewer recommends to accept this paper.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal addressed most critical concerns raised by reviewers: most of them are on clarification, such as missing details and confusing symbols. They also conducted statistic t-test to validate the significant differences. Given the contributions of the paper, I recommend acceptance.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper proposes a method for multiple organ segmentation with partial label learning. Despite the rebuttal, there are concerns on novelty of the proposed method and the degree of performance improvement offered by the proposed method. The writing and presentation also requires a major revision to address methodological and experimental details. Hence, this is not yet ready for MICCAI. Recommendation is to reject.



back to top