Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Han Yang, Lu Shen, Mengke Zhang, Qiuli Wang

Abstract

Since radiologists have different training and clinical experiences, they may provide various segmentation annotations for a lung nodule. Conventional studies choose a single annotation as the learning target by default, but they waste valuable information of consensus or disagreements ingrained in the multiple annotations. This paper proposes an Uncertainty-Guided Segmentation Network (UGS-Net), which learns the rich visual features from the regions that may cause segmentation uncertainty and contributes to a better segmentation result. With an Uncertainty-Aware Module, this network can provide a Multi-Confidence Mask (MCM), pointing out regions with different segmentation uncertainty levels. Moreover, this paper introduces a Feature-Aware Attention Module to enhance the learning of the nodule boundary and density differences. Experimental results show that our method can predict the nodule regions with different uncertainty levels and achieve superior performance in LIDC-IDRI dataset.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16443-9_5

SharedIt: https://rdcu.be/cVRyb

Link to the code repository

https://github.com/yanghan-yh/UGS-Net

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes to adapt a FCN architecture to the segmentation of lung nodules while taking into account uncertainty. The uncertainty stems from the fact that different radiologist contour the same region differently, due to the subjectivity of the manual segmentation process. The authors argue that these differences are not random, and model them in their proposed UGS-Net. This is an enhanced UNet augmented with a Uncertainty Aware Module and a Feature-Aware attention model. The content of the paper is technically sound and the results appear superior to other comparing methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The is technically sound.
    • The observation that areas of uncertainty and disagreement between radiologist correspond to different tissue density when compared to the areas where agreement can be found is clever.
    • The statement that disagreements are not random and therefore can be modelled is interesting
    • The authors provide ablation studies
    • The number of other approaches used for comparison is sufficient
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • It is not clear what is the formulation of the Gabor and Otsu operators and how this impacts backprop.
    • It looks like the network is trained end-to-end with the loss L = LBCE (S, GTS ) + LUAM. is that correct?
    • The learning objective LUAM = LBCE(∪(GT)′,∪(GT)) + LBCE(∩(GT)′,∩(GT)) + LBCE(GTS′ ,GTS) contains terms that are in contrast with each other, therefore a balance may be found by the optimisation algorithm which takes into account all the different GTs without necessarily modelling the effect of different ground truths. Of course, since the authors seem to use different prediction heads for the computation of the three components of LUAM, this effect might only be mild. In the supplementary materials the authors show ablation studies which seems to confirm that the presented version of the network is actually better. they mention two other versions V1 and V2, without really specifying their respective configuration. I suggest to at least mention the result of ablation studies of 1) FCN trained with LUAM without separate prediction heads, 2) FCN + UAM only and 3) FCN + UAM + FAAM
    • It is unclear to me how certain experiments are performed. For example, the standard UNet used for comparison is trained on single ground truths. It achieves different results on single annotations, intersection and union between annotations. These results are usually inferior compared to UGS-Net. But, was the normal UNet trained on unions and intersections when performing experiments relating to unions and intersections (resp)?
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Some details are in my opinion missing. Reproducing the paper would require quite some effort.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Radical changes are not necessary. It is important to explicitly and throughly state why the method is better than a normal FCN (Eg. approaches used for comparison). Without going in too much detail about the dataset you have selected, you want to show that your approach yields better results. So, supposedly, you have a dataset with multiple annotations for each lung nodule. You propose experiments done on union, intersection and single GT annotations. When testing the performance on union you need to train the approaches used for comparison (which do not have the luxury of creating union masks as a byproduct of their execution) on unions. Same thing about intersections, etc. Revision of ablation studies in supplementary material is suggested.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I believe there is merit to this idea and the results seem to confirm that. Clarifications about the experimental evaluation are appreciated though.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #2

  • Please describe the contribution of the paper

    This paper proposed a method for lung nodule segmentation on the LIDC-IDRI dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is in a good format and is easy to understand.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • As described in Sec. 1, line 5, the LIDC-IDRI dataset provides multiple annotations. If a method conducts experiments on this dataset and only chooses a single annotation as the learning target, the method is not a conventional method or a traditional method, it is a wrong method. The motivation of this paper is confusing;
    • Figures in this paper are relatively small and are hard to see.
    • Nodules that are smaller than 3mm in the LIDC-IDRI dataset are commonly not used for experiments. It seems that the authors don’t know the dataset enough;
    • There are many grammar and spelling errors. For example, in the Abstract, line 13, “LIDC-IDRI” should be “the LIDC-IDRI”.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Please refer to 5. for more information.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Please refer to 5. for more information.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    In this manuscript, the author proposes an uncertainty-guided segmentation network to fully utilized all the annotations. A multi-confidence mask is proposed to obtain the uncertainty levels by including the intersection and union among all annotations. Then, the uncertainty-aware module and feature-aware attention module are applied to learn the uncertainty from the ambiguous regions, especially on the boundaries. The experiments on the LIDC dataset show state-of-the-art results, and the ablation study demonstrates each component’s effectiveness.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is well organized and easy to read.

    2. The paper has interestingly found that uncertainty is associated with certain HU distributions in ambiguous regions. As the HU value relates to the tissue’s density, the authors take advantage of the uncertainty in the HU value and guide the segmentation network to learn from the uncertainty to generate better results on the edges and ambiguous regions.

    3. The attention module by integrating the multiple annotations is novel. The proposed network takes all annotations in training as well as includes intersection and union to enrich the attention on the uncertainty regions. As the state-of-the-art methods focus more on automatically learning the variance with VAE, this paper provides an alternative method to focus on the uncertainty region。

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. “UGS-Net’s input is the lung nodule CT images, and its final learning target keeps consistent with the current mainstream methods, which is a single annotation GTs.” While the multiple annotations are provided, how to choose which annotation will be conducted in the uncertainty aware module and the final output?
    2. As the intersection and union provided, did the author consider calculating the average of the GTs, as the intensity of the average mask may include the agreement probability of the nodule region?
    3. No comparison with other uncertainty based methods ref[ 8- 10].
    4. In table 1, how to implement the baseline U-Net with multiple annotations? Minor:
    5. Some fonts in Fig. 1(B), Fig. 2, Fig. 4(B) are too small to read
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The author claims to release the source code and the paper provides enough details to reproduce the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Please provide more details for the questions mentioned in limitations. The state-of-the-art methods are only conducted on a single annotation. Please also compare the state-of-the-art methods by adding the uncertainty - references [8-10] mentioned in this paper.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper provides a novel solution to integrate all annotations to improve semantic segmentation accuracy. Instead of using VAE to encode the variance, the authors provide an alternative way to implement the uncertainty for annotation via a self-attention model - which is novel. The results show the effectiveness of the proposed methods.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Two reviewers here were positive and one negative. I agree with the positive reviewers that this is an interesting and clever approach to deal with inter-rater differences in segmentation.

    However, even the positive reviewers brought up important questions as to experiments (subpar baselines, no comparisons against uncertainty-based alternatives) and additional concerns about the ablation validity and details on how the baselines were evaluated. Please try to address these valid concerns in the rebuttal.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6




Author Feedback

Thanks to all reviewers for careful reviewing, which were mostly positive about our paper. The main suggestions and our responses can be summarized as follows:

  1. How the baselines were evaluated? (All REVIEWERs) Response: We ran 5-fold validation on normal U-Net, R2U-Net, Attention U-Net, Attention U-Net+UAM (V1), V1+self-attention block (V2), and UGS-Net. The performances of other models were provided by other studies. V1, V2, and UGS-Net were trained with our settings, other methods (no-multiple-annotation methods) were trained with the traditional training strategy (each nodule had a single annotation). The baselines were subpar because: (a) 5-fold validation eliminated the effect of data split. (b) Nodules with a single annotation were excluded.

  2. All reviewers provided suggestions about improving the descriptions about experimental settings, especially REVIEWER #1. Response: (1) To REVIEWER #1, #3, the normal U-Net and other no-multiple-annotation methods were trained on the single annotation. Information in Table 1 showed the performance differences between our method and traditional methods trained with traditional training strategy. Information in Table 2 showed that traditional training strategy could not let normal methods have the ability to focus on uncertain regions. (2) To REVIEWER #1, we will provide configurations for different versions of models mentioned in the ablation validity. (3) To REVIEWER #1, the Gabor and Otsu operators were applied to the certain CNN feature maps directly, the backprop would not affect the parameters of Gabor and Otsu, but it would affect the neurons in the network. The network was trained end-to-end. (4) To REVIEWER #2, we will add more accurate information about dataset.

  3. If a method conducts experiments on this dataset and only chooses a single annotation as the learning target, the method is not a conventional method or a traditional method, it is a wrong method. The motivation of this paper is confusing. (REVIEWER #2) Response: We did not comment whether other methods were wrong if they choose a single annotation. REVIEWER #3 gave this paper a better description: other methods focus more on automatically learning the variance with VAE, this paper provides an alternative method to focus on the uncertainty region. We will strengthen the description of our method to remove possible misunderstandings.

  4. No comparison with other uncertainty based methods(REVIEWER #3) Response: Other methods focused more on automatically learning the variance with latent space [10,11]. These models usually provided different annotations with latent variances and evaluated methods with metrics like Generalized Energy Distance. Our model, on the contrary, provided a stable MCM which could reflect the uncertain region and provide a stable suggested segmentation, which could be evaluated with Dice, IoU, and NSD. These methods were evaluated with different metrics. As a result, we did not compare our method with other uncertainty-based methods. We will figure out a way to evaluate these methods under the same evaluation metric in the future work.

  5. How to choose which annotation will be conducted in the uncertainty aware module and the final output? (REVIEWER #3) Response: LIDC-IDRI didn’t provide the information about which annotation was better. So the target annotation for the final output was the first annotation that appeared in the annotation list. We will update description about this part.

  6. Did the author consider calculating the average of the GTs … ?(REVIEWER #3) Response: We actually have tried calculating the average of the GT. But, each nodule with multiple annotations has 2-4 annotations. The average GTs would be affected by extreme annotation and cut the low-dense regions into smaller regions for some specific situations. We cannot decide which regions are more likely to be nodule tissues. So we keep the union, intersection, and a single annotation as the learning targets.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    I agree with R1 and R3 here that the authors’ approach here to model uncertainty based on mask intersections and unions and the corresponding HU distribution to be an interesting idea. I found R2 to be overly severe here without enough concrete points.

    Even after the rebuttal, I still found certain aspects of the experiments confusing (particularly as to the baselines) and general clarity not as good as I would have liked. Even so, the interesting take on this problem and the improvements in results make me lean toward accept for this work.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The article proposes to adapt a FCN architecture to the segmentation of pulmonary nodules while taking uncertainty into account. The main contribution concerns uncertainty. However, there is no comparison with other methods based on uncertainty. It is therefore difficult to confirm the superiority of the proposed method in this type of approach. I think some work should be completed in the future. My proposition is “reject”.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    NR



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The key strength of this work is its uncertainty aware module that takes into account variations in annotation from different experts when segmenting lung nodules. This idea is seemingly novel, which all of the reviewers agree on.

    Concerns that were mentioned were related to experimental setup details and a lack of comparisons, as well as the worry that baselines were not adequate. The rebuttal addressed most of these issues convincingly. The lack of comparison to other uncertainty based methods was indicated as potential future work with the argument that it is hard to compare the chosen strategy with existing strategies that don’t necessarily take into account multiple annotations. I tend to agree with this argument and think that the merits of the presented idea as well as the extensive evaluation outweigh this lack of comparison.

    Overall, this seems to be a valuable contribution for the MICCAI community.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    11



back to top