Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Zdravko Marinov, Rainer Stiefelhagen, Jens Kleesiek

Abstract

Interactive segmentation reduces the annotation time of medical images and allows annotators to iteratively refine labels with corrective interactions, such as clicks. While existing interactive models transform clicks into user guidance signals, which are combined with images to form (image, guidance) pairs, the question of how to best represent the guidance has not been fully explored. To address this, we conduct a comparative study of existing guidance signals by training interactive models with different signals and parameter settings to identify crucial parameters for the model’s design. Based on our findings, we design a guidance signal that retains the benefits of other signals while addressing their limitations. We propose an adaptive Gaussian heatmap guidance signal that utilizes the geodesic distance transform to dynamically adapt the radius of each heatmap when encoding clicks. We conduct our study on the MSD Spleen and the AutoPET datasets to explore the segmentation of both anatomy (spleen) and pathology (tumor lesions). Our results show that choosing the guidance signal is crucial for interactive segmentation as we improve the performance by 14% Dice with our adaptive heatmaps on the challenging AutoPET dataset when compared to non-interactive models. This brings interactive models one step closer to deployment in clinical workflows. We will make our code publicly available.

Link to paper

DOI: https://doi.org/10.1007/978-3-031-43898-1_61

SharedIt: https://rdcu.be/dnwBV

Link to the code repository

https://github.com/Zrrr1997/Guiding-The-Guidance

Link to the dataset(s)

http://medicaldecathlon.com/

https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=93258287


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper addresses the limited prior research on guidance signals for interactive models in medical image segmentation by conducting comparative analysis on five existing guidance signals and a novel one. The contributions of the paper are three-fold: comparison of existing guidance signals by extensive experiments and suggesting default values for essential parameters, introduction of evaluation metrics for comparing guidance signals, and proposal of novel adaptive Gaussian heatmaps which adjusts the radius of heatmaps dynamically for each new click and outperforms the other five in most metrics.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    a) Rich and fair experiments: The paper conducts experiments on two different medical image datasets and compares five existing guidance signals while varying various hyperparameters. This approach ensures that the results are robust and significant. b) Comprehensive evaluation metrics: The paper introduces five evaluation metrics that consider multiple aspects of the guidance signals, including performance, efficiency, and ability to improve with new clicks. This provides a systematic framework for comparing guidance signals and allows for a more comprehensive understanding of their strengths and weaknesses. c) Interpretable and improved method: The paper proposes a novel adaptive Gaussian heatmap method that uses geodesic distance values to adjust the radius of heatmaps adaptively. This method has good interpretability and improve the performance. d) Well-structured paper: The paper is well-organized, with clear section headings and a logical flow of information. This makes it easy to follow and understand the main contributions of the research.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    a) The generality of experiment results is limited. The reasonability of motivation should be considered for discussing guidance signals being independent to models or interactive rules. Though fair for current experiment designs, different guidance signals may require different models or interactive rules to achieve optimal performance, e.g., using interior margin point clicks [1] or first click attention [2]. Moreover, when a new framework or network is proposed for interactive segmentation in the future, the generality of results on baseline models should be considered. b) The experiment settings for Fig.1 (b) and (e) are not clear. It is unclear whether each chart is for a certain signal or for all five signals, since there is only one chart for each dataset and each hyperparameter but five signals are discussed. c) The † notation in Table 2 of the supplementary material is confusing, making it difficult to assess the significance of the proposed signal’s improvements compared to the other five. d) The proposed adaptive Gaussian heatmap has significant limitations. Efficiency is crucial in interactive models for practical applications [3]. However, the proposed adaptive Gaussian heatmap sacrifices too much efficiency to achieve a slight improvement in accuracy and a significant improvement in interpretability.

    [1] Luo, Xiangde, et al. “MIDeepSeg: Minimally interactive segmentation of unseen objects from medical images using deep learning.” Medical Image Analysis 72 (2021):102102. [2] Lin, Z., et al. “Interactive Image Segmentation with First Click Attention.” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) IEEE, 2020. [3] Kirillov, Alexander, et al. “Segment anything.” arXiv preprint arXiv:2304.02643 (2023).

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors have indicated their willingness to make code associated with their work publicly available, and have provided detailed description of their datasets and experimental procedures in the submitted manuscript. As a result, this paper demonstrates a high degree of reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    a) The paper would benefit from a more comprehensive scope that explores guidance signals beyond the baseline experiment designs, such as interactive rules that incorporate user prior information. b) The proposed adaptive Gaussian heatmap needs to be optimized for better computational efficiency to enhance its practicality. c) The experiment settings for the results presented in Fig.1 (b) and (e) should be clarified. d) The notation of † in Table 2 of the supplementary material should be revised to indicate where the proposed method outperforms the other methods significantly.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The major factors that led me to my overall score for this paper were primarily based on the generality and significance of the comparative results presented in the paper. Additionally, I was concerned about the proposed adaptive Gaussian heatmap since it significantly sacrifices efficiency to achieve a slight improvement in accuracy and a significant improvement in interpretability. Overall, while the paper had some strengths, such as its comprehensive evaluation metrics and its interpretability, these weaknesses impacted my final score.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    5

  • [Post rebuttal] Please justify your decision

    The authors addressed my inquiry regarding the efficiency of the proposed adaptive Gaussian heatmap. Considering their explanation and taking into account the interesting nature of this study on interactive segmentation, I maintain my suggestion of a weak acceptance. However, I would encourage the authors to clarify the experiment settings presented in Figure 1 and to further explore a comparative study based on practical usage scenarios.



Review #2

  • Please describe the contribution of the paper

    Deep learning-based interactive segmentation methods often work by encoding user clicks into a heatmap which is then fed, additionally to the input image, to the segmentation network. This paper studies different ways of generating such heatmaps (for instance drawing a disk, or computing a geodesic distance map) and investigates their impact on the segmentation accuracy. To this end, the authors introduce 5 metrics that can be used to compare them (initial Dice, final Dice, efficiency, consistent improvement, GT overlap) and propose a new adaptive heatmap.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper reports a thorough comparison of different strategies to take into account user clicks to train an interactive segmentation network.
    • Two different clinical applications were considered for the validation.
    • This paper also attempts at standardizing the evaluation of interactive segmentation algorithms by introducing 5 different metrics.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The contributions of this paper seem very incremental, both conceptually and in terms of performance.
    • The proposed guidance does improve a bit the Dice coefficient but is also one of the worst in terms of efficiency, which I believe is a major problem when designing an interactive workflow.
    • I also have some concerns about the guidances shown in Figure 1 (see Section 8 of this review)
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors used publicly available datasets and plan on releasing their source code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html
    • I find Figure 1a quite confusing. I do now really understand the first the first row. On the second row, the comparison of the heatmaps also raises questions:
      • The first three really look identical so I would not expect much difference in terms of accuracy.
      • Also, “GDT” and “adaptive heatmap” seem to have a much larger scale than the others, which makes me wonder if the comparison is fair and makes sense.
      • The adaptive maps don’t look like what I would have expected. Since all the points are inside the spleen, I would have expected bright areas inside the spleen that would stop at the boundary (the contrast is very strong so it should be easy for the adaptive kernel to fit to the border). This is however now what we see (see top point of GDT, or left point of adaptive heatmaps). From my perspective, this casts some doubt on the correctness of the implementation.
    • For the spleen, the red line showing p=0 (Figure 1e) seems to indicate that the interactions only improve the Dice score by < 0.005. Can you comment on that?
    • The differences in Dice are sometimes so small that I wonder whether they are significant. Statistical tests (for instance Wilcoxon signed-rank) should be performed.
    • (H2) I am not sure to understand why the top distance values are completely discarded. Within an exponential their contribution is almost null, isn’t it?
    • (H4) Is this a binary decision between 0 interactions and N? Are there no training sample with only N/2 interactions?
    • (M4) Shouldn’t there be a threshold for a minimal improvement to be considered? By definition a click would always improve the Dice, shouldn’t it? Unless the guidance leaks to background pixels?
    • How exactly were the seeds selected? Completely randomly or close to the structure boundary?
    • Were the simulated seeds fixed for all methods during comparison? If not, there could be some randomness involved in the comparison.
    • Figure1c: It would be more readable if the y-axes scales were consistent across the graphs
    • The name “adaptive Gaussian heatmap” of the proposed method is a bit confusing because it does not contain the concept of geodesic distance (like the other two baselines) although it is based on it.
    • Since it is a core component of the method, the geodesic distance transform could have been explicitly defined (instead of citing a paper)
    • The definitions of the distances are not consistent (sometimes they are inverted or not normalized). There is a note acknowledging this, but it would be easier to just change their definition.

    Typos:

    • “which propose an user guidance” ->which proposes a user guidance
    • “preciser” -> more precise
    • “3.1 Hyperparemeters : Results” -> Hyperparameters
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    3

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The technical contributions are very incremental with respect to [5] (basically adding some non-trained parameters into the equation of a distance function). The impact in terms of performance seems relatively marginal.

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    4

  • [Post rebuttal] Please justify your decision

    The rebuttal clarified some issues (for instance the general concerns on efficiency), but not all of them. Overall the explanations still lack a bit of insight/analysis: some results seem surprising but we are just referred to the numbers.

    Figure 1 is overall still very confusing to me.

    • I am still not sure I understand row a.
    • My comment on the different scales of the heatmaps was not addressed.
    • I don’t understand why the adaptive heatmap do not stop better at the boundaries. The rebuttal says
    • In Figure 1e, MSD spleen seems to converge towards 0.95 or slightly above, but the authors report 96.87 in the table and their rebuttal. Why this difference?

    All in all, I will slightly raise my recommendation to acknowledge the authors’ rebuttal. However I still lean towards rejection due to the very incremental nature of the contribution, both in terms of methodology and impact on the results.



Review #3

  • Please describe the contribution of the paper

    The authors study the existing user guidance signals(Euclidean distance transform, Geodesic distance transform, exponentialized Geodesic distance) that are commonly used in the interactive segmentation models, compare the performance of those user guidance signals by utilizing different parameter settings in the segmentation modes and with different performance metrics and based on the findings of this comparative study the authors propose an adaptive guidance signal.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strengths of this paper are: (1) performance comparison of the existing user guidance signal such as Euclidean distance transform, Geodesic distance transform exponentialized Geodesic distance by exploring four different hyper-parameters such as (a)sigma i.e. radius σ of disks and heatmaps, (b)truncating values of distance based signals,(c)methods for combining guidance signals (d) probability of interaction i.e. decide for each volume whether to add N clicks or not.

    (2)proposing a new adaptive Gaussian heatmaps that utilizes Geodesic Distance Transform as a pseudo signal. This pseudo signal helps in imposing large sigma value in homogeneous regions and small, precise sigma value near the edges. The proposed adaptive gaussian heatmaps achieve superior performance compared to the existing signals.

    (3)comparing the performance based on five different performance metrics such as final dice score i.e. Mean Dice score after N = 10 clicks per volume, initial dice score, efficiency i.e. inverted time measurement (1 − T) in seconds, consistent improvement, ground truth overlap

    (4)the comparative experiments lead to insights into strengths and weaknesses of existing guiding signals. The authors conclude that smaller radiuses, small threshold and more iterations with interactions, and traditional concatenation seem to perform better for existing signals whereas the tendency to include overly large radius near edges often penalize the existing signals.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Novelty is limited.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The experiments should be reproducible as the authors used existing signals and existing frameworks and clearly specified the parameter settings and performance metric used. The authors also mentioned that code will be made public.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2023/en/REVIEWER-GUIDELINES.html

    The authors mention that they introduced five evaluation metrics which evaluate the performance of the different guidance signals. But the final Dice Score seems to the Dice score that is generally evaluated for segmentation based tasks. Elaborating the difference between the two would be helpful.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The novelty is limited. But the merits of the paper cannot be ruled out as the authors are able to achieve a superior performance with their proposed heatmap.

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Strength: 1) The motivation is strong as interactive segmentation is clinically important. The author also extensively compared existing approaches to introduce its own method, adaptive Gaussian heatmap. 2) Adaptive Gaussian heatmap is a systematically way and easy to interpret.

    Weakness: 1) Adaptive Gaussian heatmap also sacrifice the computational efficiency. Please discuss it and provide potential solutions. 2) Please explain Fig 1 in detail. 3) As pointed by reviewer 3, Dice coefficient is the only value to report the improvement over five baseline method. However, this work utilized five different evaluation metrices. Please add values of other four metrices.




Author Feedback

We thank all reviewers for their constructive comments.Here we address raised concerns R1,R2:Adaptive heatmaps sacrifice too much efficiency A:We agree efficiency is crucial so we include it as a metric.While adaptive heatmaps require the most computation as they combine 2 signals (GDT and heatmaps),they still only need <1s for each new click,which is a comptetitive time compared to related work([5]-1s,BIFSeg-0.7s,DeepIGeoS-3s,f-BRS-1s)It is nevertheless possible to improve the efficiency (see next answer) Meta:Please provide alternatives for efficiency A:It is possible to speed up adaptive heatmaps by computing the geodesic map only once for the first click and fix it for the rest of the clicks as the geodesic is the bottleneck.This leads to a similar efficiency as the fastest signal(disks) without sacrificing performance R1:Generality of experiments:Signals may need a certain model or interactive rules A:We found large initial clicks in central regions and precise boundary clicks to be crucial properties and validated this via our adaptive heatmaps.These intuitive properties are model-agnostic and can be used as design principles for guidance signals in different models and interactive rules without limiting the generality of our experiments R1,R2:Adaptive heatmaps only slightly improve performance A:Improvements are significant(p<0.05) on 4 metrics and 2 datasets.This is best seen in our M4 metric as a large gap to other signals:0.73->0.81(MSD),0.68->0.78(AutoPET) R2:The technical contributions are incremental to [5] A:Our method differs conceptually from [5] as we use iterative refinement,whereas [5] initialize internal margin points without iterative clicks.Our M4 metric specifically focuses on iterative improvement,setting us apart from [5] R2:In Fig.1e,MSD interactions only improve DSC by <0.005 A:The exact DSC improvement is in Tab.3(94.90->96.87) which is 1.97 R2:Differences in DSC are small.Statistical tests are needed A:We do perform t-tests(supp.Tab 2) and achieve significant improvement on 4 metrics and 2 datasets compared to the second-best signal.We will move this from supp. to the main paper for visibility R3:Limited novelty A:We go beyond a novel guidance signal.We examine factors that make signals effective[H1-H4] and how to evaluate them[M1-M5].Uncovering weaknesses in existing signals informs the design of our adaptive heatmaps,providing fresh insights for designing guidance signals in future work R3:How does the Final Dice differ from standard Dice? A:Final Dice needs N user clicks as the model iteratively refines its prediction whereas standard Dice is non-interactive.We compare to non-interactive models to show how N clicks improve Dice substantially(64.89->79.89 AutoPET),leading to a higher clinical relevance R2:Heatmaps,Disks,EDT look the same so I expect the same DSC A:They are visually similar but differ numerically(Eq.1-Eq.3);our results and related work[5] confirm this leads to significant DSC differences R2:Adaptive maps should stop at boundaries A:Adaptive maps are Gaussian spherical maps and cannot align with edges perfectly as geodesics can but are smaller near edges as they are guided by geodesics R2:How are seeds simulated? A:Initial seeds are random but same for all models;consecutive seeds are sampled from over- and undersegmented regions based on the model’s error as in [11,16] Meta,R1:Explain Fig.1 A:a)1st row:Select clicks->Compute mean local GDT values for each click->Map mean GDT to radii using Eq.6(depicted as curve plot)->Transform clicks into heatmaps with mapped radii;colors in the last images encode how voxels are mapped to radii a)2nd row:Examples of each signal b)Aggregated results for sigma and theta for all 5 signals c)Results for sigma d)Results for theta e)Aggregated results for input adapter and interaction probability for all 5 signals Meta:Please add values of other 4 metrics A:The values are in supp.Tab 2 and we will move them to the main paper for visibility




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Major concerns are not satisfactorily resolved in the rebuttal, especially the improvement of four other metrics besides DSC.



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal has provided some response especially regarding the efficiency issues but not fully addressed the reviewers’ concerns. Overall, the contributions of this paper is quite incremental and the technical novelty is low, compared with existing works. In addition, the presentation of the paper requires further improvement. It appears that the paper needs more revision before ready for publication.



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    I think interactive segmentation is a very important topic in many medical image analysis applications. Authors provide a comprehensive evaluation of several guidance strategies. Weaknesses of the work are the seeming lack of methodological contribution, some concerns on computational efficiency, confusion in presentation of the main figure and inconsistent presentation of the quantitative results. After reading the rebuttal, I find that many of the mentioned concerns were adequately addressed, especially regarding missing information (that was in the supp. material), and computational efficiency. The lack of large methodological contribution is balanced by the relevance of the topic and the comparison. I am convinced that the MICCAI community will find this work interesting.



back to top