Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Luyue Shi, Xuanye Zhang, Yunbi Liu, Xiaoguang Han

Abstract

Interactive segmentation is of great importance in clinical practice for correcting and refining the automated segmentation by involving additional user hints, e.g., scribbles and clicks. Currently, interactive segmentation methods for 2D medical images are well studied, while seldom works are conducted on 3D medical volumetric data. Given a 3D volumetric image, the user interaction can only be performed on a few slices, thus the key issue is how to propagate the information over the entire volume for spatial-consistent segmentation. In this paper, we propose a novel hybrid propagation network for interactive segmentation of 3D medical images. Our proposed method consists of two key designs, including a slice propagation network (denoted as SPN) for transferring user hints to adjacent slices to guide the segmentation slice-by-slice and a volume propagation network (denoted as VPN) for propagating user hints over the entire volume in a global manner and providing spatial-consistent features to boost slice segmentation. Specifically, as for SPN, we adopt a memory-augmented network, which utilizes the information of segmented slices (memory slices) to propagate interaction information. To use interaction information propagated by VPN, a feature-enhanced memory module is designed, in which the volume segmentation information from the latent space of VPN is introduced into the memory module of SPN. In such a way, the interactive segmentation can leverage both advantages of volume and slice propagation, thus improving the volume segmentation results. We perform experiments on two commonly-used 3D medical datasets, with the experimental results indicating that our method outperforms the state-of-the-art methods. Our code is available at https://github.com/luyueshi/Hybrid-Propagation.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16440-8_64

SharedIt: https://rdcu.be/cVRwR

Link to the code repository

https://github.com/luyueshi/Hybrid-Propagation

Link to the dataset(s)

http://medicaldecathlon.com/

https://kits19.grand-challenge.org/


Reviews

Review #1

  • Please describe the contribution of the paper

    This is a well written and organized paper that presents a hybrid (two network) approach for interactive, semi-automatic segmentation of 3D medical images.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The approach leverages a slice propogation network to propogate information from use scribbles on one slice to adjacent ones using memory modules. A volume propogation network is used to perform volumetric segmentation and generate features for the memory modules. This is an interesting approach that is novel to my knowledge and is broadly applicable. The approach is tested on segmentation of multiple organs and compared to multiple methods. Quantitative results are impressive. The method appears to generalize well to a variety of applications.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The validation study is designed to automatically simulate user interaction scribbles. It’s not clear if the method would perform as well with a human scribbler.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The methods are detailed and reproducible

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Adding a small user study showing better accuracy with less interaction time/clicks with this method compared to others would move this paper from good to great.

    Another way to expand the validation would be to include other image modalities in the analysis.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This is a good paper that is of interest to the MICCAI community but has a minor weakness.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #2

  • Please describe the contribution of the paper

    The paper proposes a method for interactive segmentation of 3D medical data. The method uses two modules, 1) a slice propagation network (SPN) for transferring user interactions to adjacent slices and 2) volume propagation network for transferring user hints over the entire volume. The main motivation of using SPN is to avoid having to run a 3D model on whole volume, which would require higher memory usage. The proposed method improves upon existing interactive segmentation methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The interactive segmentation of medical images is an important but under-studied problem.
    • The proposed method improves performance over existing interactive segmentation methods on multiple datasets.
    • The paper is well written and relatively easy to follow.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • It is unclear how many interactions are used for results in Table 1. Is same method used for generating user interaction for all methods?
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    It would probably be difficult to reproduce the method as it includes multiple modules, implementation details of which are not completely clear, e.g.

    • It is unclear what “Semantic segmentation loss” means?
    • It is unclear which model wights are referred to be “the pre-trained weights on video object segmentation is used for initialization”
    • The details of scribble generation are missing.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    In Fig. 1, space character is missing for some labels.

    The language needs some improvement:

    • “refine the volume segmentation result a time”
    • “both two networks for slice and volume propagation are included in our method”
    • “This two networks collaborate with each other for segmentation.”

    In Fig. 3 including the performance for one or two baseline methods would have been nice.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The problem is somewhat under-explored and the proposed method is evaluated on 4 datasets and it out-performs baseline methods on all 4 of them.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Somewhat Confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    This paper introduces a method for interactive segmentation of 3D medical scans. The pipeline relies on two networks working iteratively on the input volume, the user scribbles and the previous segmentation: one in 3D working on a crop of the volume, and one in 2D that propagates the segmentation from one slice to another. The former is based on a 3D Unet, while the second one is based on the Deeplab architecture with a memory module that takes as input both the 2D input and the corresponding slice of the 3D feature map of the first network. The method is evaluated on four segmentation tasks from public CT databases: kidney, tumour, lund and colon.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • This paper addresses a relevant (yet not particularly researched) problem, namely interactive segmentation. This is particularly important nowadays to help clinician annotate images more efficiently.
    • The method is compared on several datasets and to several baseline approaches.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • My main concern is that I am not sure whether I can see such a complex approach easily deployed in clinical use. It is an interactive method, but it still (1) requires networks training and therefore specific to the anatomies it has been trained for, (2) requires some computational power on the user-end to run those networks. In order to be used in practice, I feel like an interactive method should be swift and lean. Here, there is no mention of the computational time of each prediction, or the overall time it takes an actual user (not simulated scribble) to segment in a satisfactory way the organ of interest, or a study on generalization to new structures.
    • The complexity of the method makes it difficult to reproduce; for instance it uses both a 2D and a 3D network, the training has 4 different steps where various parts of the network are trained, etc.
    • The method could have been better explained. I don’t understand why one part is in 3D and the other in 2D. Figure 1 is overall a bit confusing: the order of the different operations is not clear.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Training/evaluation code, as well as pretrained models and data, are specified as “available” but no link has been provided.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Overview

    • I am not sure to understand why we need both a 3D and a 2D network.
    • How does the network know when to stop the propagation (at the of the “end of the organ”)?
    • How is the 3D crop first defined? How does the network know the extent in the out-of-slice direction?
    • Is the volume network trained at a fixed resolution in pixels? (or in mm?) or do you train it with varying input size?
    • Figure 1: The slices shown are upside/down (CT table is on top of the images), it’s not very important but it does look weird.
    • Figure 1: What do the yellow arrow mean above/below the memory modules?

    The experiment numbers are not very detailed.

    • Reporting the mean is not enough: we need more details (at least standard deviation, quartiles or even better full boxplots or violin plots)
    • No statistical tests were performed
    • Dice coefficients are only one part of the picture, other complementaty metric Hausdorff distances
    • How many interactions were necessary to obtain the results from Table 1?
    • Failure cases/areas of improvements are not discussed (also no future work is mentioned in the conclusion)

    Misc

    • “Given a 3D volumetric image, the user interaction can only be performed on one of all slices,” I disagree with this statement, several commercial and open-source software (Slicer, ITK-SNAP) offer 3D brushes.

    • I appreciate the comparison with several other baseline methods, but it seems to me that in the context of interactive segmentation, the current standard is not necessarily deep learning-based approaches. It would have been even nicer to compare with other standard approaches like watershed, grabcut [https://docs.opencv.org/3.4/d8/d83/tutorial_py_grabcut.html], random walker [http://vision.cse.psu.edu/people/chenpingY/paper/grady2006random.pdf], etc.

    Figure 2

    • Which cases are shown here? median ones? good ones? Please indicate how they were selected
    • Some images with a particularly low quality (first two crops), please regenerate them
    • A 3D visualization of the different segmentations would also be nice

    • Does Figure 3 contain average numbers over the whole dataset or just one example?

    There are a number of typos:

    • “This process will stop when reach a stop slice,”
    • “Note that the VPN is not work in the first iteration”
    • “both two networks for slice and volume propagation are included”
    • “This two networks”
    • “to crop and generate volume patch input of our VPN”
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper has merits but I am a bit concerned with the complexity/reproducibility of the method, as well as its applicability. Since I also had a number of remarks on the clarity and experiments parts of the paper, I am leaning towards rejection.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    4

  • [Post rebuttal] Please justify your decision

    The authors replied to some of my questions, sometimes in a satisfactory manner, sometimes too concisely (but this is probably due to the limited space.) Q2) For instance on the annotation time, it surely depends on the anatomy so it’s hard to assess whether 4.5minutes is a good time or not. Q4) I understand that the requirements are not extravagant but still an RTX2080 is not something that most computers in the clinic have, and is more restrictive than what non DL-based interactive segmentation algorithms require. Q5) I am also not very convinced by the answer, especially why 3D approaches are not adapted.

    My main criticisms however remain: lack of comparison to non-DL standard approaches, necessity to train one model per anatomy, lack of details in the experimental section (statistical tests, metric distribution), complexity of the pipeline. Therefore I will keep my recommendation.




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This work presents a hybrid network for interactive segmentation that allows interacting with the framework using multiple slices. Current techniques only allow interaction/refinement in a single slice, therefore, the paper addresses a relevant but overlooked problem. The paper is well written and easy to follow.

    The framework is presented as a general framework. However, the experimental section does not provide a sufficiently large ser of experiments that allows to confirm the generic nature of the framework. The authors are invited to provide more details about the generalizability of their framework. As one of the key aspects of the framework is the interactivity, the authors should provide more details about this criterion in the experimental section, as it has been highlighted by two reviewers.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    5




Author Feedback

We appreciate the ACs and Reviewers for the constructive comments. We are delighted by the many positive comments about our work is considered as addressing an important but under-studied problem (R2, R3), proposing an interesting and novel approach (R1), achieving improved/impressive performance (R1, R2), using multiple datasets (R1, R2, R3), broadly applicable (R1), requiring lower memory usage (R2), and well-written (R1, R2). In the following, we address the main concerns of reviewers as summarized by the AC. We will release our code and models upon accepted. Q1: Generalizability (AC) (1)Cross-dataset: We evaluate our method on four segmentation tasks from public multiple CT datasets: kidney, tumor, lung and colon in the experimental section and provide more visual results in the Supp. material. As shown in Table 1, the experimental results suggest that our method is superior than other methods with the same simulated scribbles and 6 interaction iterations on multiple datasets. (2)Backbone: We have replaced the network backbone of SPN and VPN with UNet and VNet. The DSC values of our methods with the new backbones and the backbones in the paper are 87.0 and 87.3 on the colon dataset, respectively, at the 6th interactions. The comparable performance suggests our proposed framework is generalized to different backbone networks (3)Modality: DeepIGeoS, also as a DL-based interactive segmentation method, have validated its effectiveness on MRI data. It is expected that our method can also perform well on another modality, but due to the time limitation, we plan to provide the results on new modalities together with the code later. Q2: User study on interactivity (R1, R3, AC) 1.Experiment Setting: a)Testee: 5 radiologists b)Data: KiTS19-Tumor, 5 CT per person. c)To-be-annotated slices/Total slices: 27.4/50 d)Tools: a.Label Me, b.Premiere (3D traditional method) c.Baseline model (3D learning-based method) d.Ours (3D, learning-based method) e)Tool training for each testee: 20 mins f)Interaction stops when achieves user-satisfied results. 2.Experiment Results For each tool, we report average annotation time and the number of interacted slice: a.25.1mins, 27.4 slices b.16.7mins, 21.8 slices c.7.3mins, 7.3 slices d.4.5mins, 4.7 slices 3.Human interaction vs. Simulated Scribbles The Dice of manually annotation results obtained in the experiments is 91.36. When using generated scribbles, the Dice of this data with 4 and 5 slides interaction are 91.19, 92.07, which proves the feasibility of generating scribbles. Q3: Scribble generation (R1, R2) We generate simulated scribbles following Caelles et al’s work (The 2018 davis challenge). By comparing the prediction with GT, we can find regions where the prediction has gone wrong. The generated scribbles are defined as a simplified skeleton of such regions, which are generated by simple binary operations. Q4: Practicality (R3) (1)Computational power: We are doing this work in collaboration with the hospital, and our model is already in clinical trials. In the trial, when our model is deployed to a GPU machine with RTX2080, the inference time per interaction is about 3.5s and the average memory demand is less than 5GB. Our method has potential for clinical use in terms of the required device and inference time.
(2)Data specificity: Requiring model training for specific data is indeed the common limitation for all learning-based methods. In the trial, we deploy our model to a server in the hospital, and the model can be trained and updated as user interacts. Q5: Necessity of hybrid design (R3) (1)3D-only: require users to provide additional 3D bounding boxes (time-consuming) for cropping patches with target of interest to alleviate computation (2)2D-only: cannot leverage 3D information of the input volume. (3)Our hybrid network can use 2D-based SPN to automatically generate the bounding box for 3D network and utilize the 3D information learned by 3D VPN to assist the final prediction.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors have addressed to a large extent the remarks from the reviewers. While it is unfortunate that the work does not provide a comparison to non-DL approaches, this can still be OK for a conference work. The authors are advised to include the remarks of the reviewers in a final version of the paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    3



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper proposes an interesting interactive 3D volume segmentation method based on the 2D and 3D combined approach. The idea seems novel and its efficacy is demonstrated via experiments. The authors addressed several unclear points in the paper during rebuttal. Based on the quality of the work and positive feedback from the reviewers, I recommend accepting this paper to MICCAI 2022.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This is an interesting work, addressing an important problem of generating 3D annotation. The rebuttal addressed the comments related to generalization and practical use of the proposed method. The rebuttal also included extra experimental results comparing with public annotation tools such as Label Me. Overall, it is a solid piece of work.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    7



back to top