Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

An Wang, Mobarakol Islam, Mengya Xu, Hongliang Ren

Abstract

Data diversity and volume are crucial to the success of training deep learning models, while in the medical imaging field, the difficulty and cost of data collection and annotation are especially huge. Specifically in robotic surgery, data scarcity and imbalance have heavily affected the model accuracy and limited the design and deployment of deep learning-based surgical applications such as surgical instrument segmentation. Considering this, we rethink the surgical instrument segmentation task and propose a one-to-many data generation solution that gets rid of the complicated and expensive process of data collection and annotation from robotic surgery. In our method, we only utilize a single surgical background tissue image and a few open-source instrument images as the seed images and apply multiple augmentations and blending techniques to synthesize amounts of image variations. In addition, we also introduce the chained augmentation mixing during training to further enhance the data diversities. The proposed approach is evaluated on the real datasets of the EndoVis-2018 and EndoVis-2017 surgical scene segmentation. Our empirical analysis suggests that without the high cost of data collection and annotation, we can achieve decent surgical instrument segmentation performance. Moreover, we also observe that our method can deal with novel instrument prediction in the deployment domain. We hope our inspiring results will encourage researchers to emphasize data-centric methods to overcome demanding deep learning limitations besides data shortage, such as class imbalance, domain adaptation, and incremental learning. Our code is available at https://github.com/lofrienger/Single_SurgicalScene_For_Segmentation.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16449-1_34

SharedIt: https://rdcu.be/cVRW9

Link to the code repository

https://github.com/lofrienger/Single_SurgicalScene_For_Segmentation

Link to the dataset(s)

N/A


Reviews

Review #2

  • Please describe the contribution of the paper

    Use of simulated data to train a segmentation network. Without the high cost of data collection and annotation, the authors claim to have achieved decent surgical instrument segmentation performance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Clarity and the simplicity of the approach

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • dataset size for Syn-A, B, and C are different *the segmentation performance is not very good for purely synthethic case. The authors don’t emphasize the Table 2 results properly. That is the most interesting part of the paper.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    likely reproducible

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    It is a nice paper to read and follow.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Not very novel; but, decent result.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #3

  • Please describe the contribution of the paper

    The proposed work has the aim to reduce the data collection process specifically for the problem of segmentation of instruments in images acquired during surgical procedures. The main idea is to use a single background tissue image and a few instrument images and apply multiple augmentation and blending techniques to synthesize new data that could be used for training. The approach is based on a chained augmentation mixing approach used during training and tested using publicly available datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The use of limitate annotate images to train a segmentation pipeline is quite relevant for the medical imaging community.

    The paper is well written and well organized.

    Public available datasets are used in the experiments.

    The code is provided.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    It is not clear what is the technical novelty of the paper.

    The approach is very similar to the AugMix and there is no comparison with other existing approaches used to create synthetic datasets (i.e. GAN based).

    The results in the experimental section are not very convincing. It seems that the approach produces good results on the simulated images. However, when the approach is tested on the unseen target domain (EndoVis-2017) the improvements drop significantly.

    Fig 3b is not discussed in the paper.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors have provided the code and the approach can be reproduced.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    The authors should provide some comparison with other data augmentation approaches.

    They have also to highlight what is the novelty, especially in comparison to AugMix.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    It seems that the results presented in this work are very preliminary. In particular, the lack of comparison with existing approaches makes it difficult to evaluate the contribution of this work.

  • Number of papers in your stack

    4

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    Not Answered

  • [Post rebuttal] Please justify your decision

    Not Answered



Review #4

  • Please describe the contribution of the paper

    The paper presents a data augmentation strategy for surgical tool assessment in which a background video/image is used and a foreground surgical tool image is then superimposed on top of it, allowing for a gold standard multiple-instance foreground mask to be readily available and to control data distribution elements. The two images can also be augmented separately to increase variety. The method is then evaluated by training a U-Net on the EndoVis-2018 dataset once with the proposed synthetic data (using a massively reduced amount of data, i.e. 2-3 foreground stills per instrument) and one with the same background augmentations applied to the full data (i.e. augmentation without “simulation”) and garners surprisingly good Dice given the very few annotations provided and with good generalisation to EndoVis-2017. A second experiment in which a previously unseen instrument is then added to the simulation database shows a further Dice improvement of ~1.6% generally.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The strength of the paper is its simplicity: it proposes an intelligent method of “augmentation by simulation” similar in spirit to using driving video games to help train self-driving cars. Although the results given in Table 1 look uninspiring at a glance, knowing how the last three rows are generated is actually quite impressive.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main weakness in the paper is that the results are not separated/collected in a way that would clearly show the magnitude of improvement caused by the method. The authors have done some work to provide contextualization, but much more could be done to really make the method shine. See the Constructive Comments for particular suggestions.

    The paper is also missing some brief discussion of where the method may be conceptually lacking (i.e. instrument shadows / interaction with anatomy, keeping instruments from crossing/obstruction in multi-tool scenes, etc…) which could help contextualize where it fits in with more data-hungry methods such as GANs which could theoretically handle these things.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper is conceptually highly reproducible, regardless if code/data is provided or not. The authors should be commended for this as it means it can be applied more broadly by the community and is not dependent on their particular implementation.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Given that this is a MICCAI paper with limited space, some of the comments are about removing content that is not strictly necessary.

    • A small section (~1/3 pg) is dedicated to blending which is really not needed given how common the technique is generally in film.
    • Eq 1 in particular doesn’t help explain the method and Eq 3 is largely a copy of Eq 2, meaning that all three could be more readily expressed as Eq 3 with a sentence defining H_i and \Theta.

    Other small comments:

    • The number of foreground stills is giving in Section 3.1 but not the number of background stills. This is stated in the introduction, but briefly restating it in Section 3.1 would be good to remind the reader of the quantity of annotated data used.
    • For Fig 3b, it would be nice to have the Dice for the vessel sealer vs the other instruments shown separately. As it stands, a ~1.6% improvement seems small, but one imagines that this may be because of the prevalence of the instrument in the Testing dataset.

    And one big comment: I stated that Table 1, despite how it looks superficially, is actually a strength, but there is, I feel, a way to make it even better. Firstly, it lacks an idea of a reference bounds. It would be improved by a row that shows an a priori reasonable lower bound on performance. (For example, this could training with the same few stills but without simulation, which would should just how much using simulation-based data augmentation as training a U-Net with only a dozen or so images is probably going to fail miserably without the nuanced data augmentation procedures suggested in the paper.) It would also be improved by a row indicating an a priori reasonable upper bound as well, such as using the same large database but with on-the-fly simulation. This latter part is important to the optics of the paper as it would show that the method presented does outperform the argument “just get more data.”

    Typos:

    • “minimal human efforts” → “minimal human effort”
  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    7

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The major factors to influence me is the analogy of the method to things seen outside of the community (i.e. self-driving cars and video games) to make it more widely interesting to a broad MICCAI audience, the simplicity of the method, and the attention taken to perform ablation studies to begin the quantify its performance more robustly. Yes, there are some weaknesses, but nothing to distract from those more core strengths.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    1

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    8

  • [Post rebuttal] Please justify your decision

    Not Answered




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper presents a data augmentation approach for surgical instrument segmentation that uses a single background tissue image and instrument in images to apply multiple augmentations and blending for new data synthesis and improving segmentation performance. The paper is well-written and easy to follow. However, it is not clear what the technical novelty is. Moreover, there is no comparison with existing relevant data augmentation techniques. Also, the approach does not seem to generalise on the unseen data very well. Additionally, relevant papers such as ‘Colleoni, Emanuele, Philip Edwards, and Danail Stoyanov. “Synthetic and real inputs for tool segmentation in robotic surgery.” International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2020.’ is missing. The literature review should be updated to include the relevant papers. How is their method different from the existing methods? Why a comparison with the existing methods is not included?

    I invite the authors for the rebuttal to address the main concerns raised by the reviewers.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    9




Author Feedback

Thank the reviewers (R) and the area chair (AC) for the critical assessment and insightful comments. Particularly, we feel greatly encouraged by the recognition from R4 with a “confident and strong acceptance.” Despite the mostly positive feedback, below are major critiques to clarify:

1: Technical novelty is not clear (R3).

We believe that the value of research work can not be evaluated solely by its technical novelty. In this work, we rethink from a data-centric point of view and prove the sufficiency of a single background tissue image and a few (two or three for each tool) foreground instrument images for surgery dataset generation. In the medical field, the data source is naturally scarce, hence data-hungry approaches like GAN usually fail. Whereas our low-cost and data-efficient approach can still work. Without the costly human effort of real data collection and annotation, we manage to create a collection of synthetic surgery datasets for instrument segmentation which can achieve very decent (R2) and impressive (R4) performance. Our approach could be applied more broadly by the MICCAI community for its simplicity (R2, R4) and because the reproducibility is highly independent of the data or the implementation (R4). Other techniques can easily be plugged into our framework to promote new solutions for various data limitation problems, such as novel instrument appearance (class-incremental learning in the 1st ablation study), synthetic-real joint training (2nd ablation study), imbalanced datasets, and domain shift. To the best of our knowledge, our approach, using a massively reduced amount of source data (R4), is the pioneering work of synthetic dataset generation in the MICCAI society.

2: Comparison with existing relevant data augmentation techniques (R3).

We agree with the reviewer and conducted additional experiments with an existing data augmentation technique of ColorJitter. Our approach still outperforms significantly with 5.33% and 4.29% of DSC gain on EndoVis-2018 and EndoVis-2017. We will add this in the final version of the paper.

3: Generalization on the unseen data (R3).

As shown in Table 1, similar to the results on EndoVis-2018, the performance remains decent and acceptable on the unseen EndoVis-2017 dataset. Thus, our approach has a good generalization performance (R4) considering extremely minimal source data.

4: The literature review should be updated to include the relevant papers (AC).

The mentioned work recorded dVRK kinematic data as the data source to synthesize a new dataset. On the contrary, our approach, without costly data collection and processing, only utilizes a few images as the data source. Besides, they can only synthesize one instrument - Large Needle Drivers. But our approach could efficiently synthesize all types of instruments. We will include this work in the paper. Besides, one related work [7] has already been discussed and compared.

In addition,

R2: Q1: Different dataset sizes. A1: Our datasets are enlarged when more practical cases are considered (A to B) or one more source image is added for each instrument (B to C). For example, some instruments may coexist in one image in some real cases. Hence, Synthetic-B extended Synthetic-A by adding additional images containing two different instruments in them, while in Synthetic-A, only one instrument exists. Q2: Emphasize the most interesting Table 2. A2: We will update to highlight the impressive performance of our approach in synthetic-real joint training.

R3: Q1: Comparison with GAN. A1: Our data-efficient approach is not data-hungry and avoids complicated adversarial training. Q2: Discuss Figure 3b. A2: The figure shows the overall performance gain with our approach and will be further discussed.

R4: We could feel the great passion from the reviewer for our work in handling limited-source dataset generation. All the constructive suggestions will surely be considered and added to the final manuscript.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors have thoroughly clarified all major comments raised during the review phase including adding comparison with existing data augmentation techniques. The proposed approach is simple and requires only small amount of data to achieve better performance than SOTA.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    3



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    I believe the authors did a good job in addressing the concerns of the reviewers specifically in terms of clarifying the novelty and comparing to previous work and thus suggest acceptance.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    NR



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper proposes a data-efficient framework to generate high-quality synthetic datasets, through various augmentation and blending techniques, for the task of surgical instrument segmentation. The proposed approach is validated on the EndoVis-2018 and EndoVis-2017 surgical scene segmentation datasets and displays promising results and generalization ability. The paper is well-written and well-motivated and the topic is very relevant to the community. Main comments of the reviewers around segmentation performance/results, generalization, and dataset size have been addressed in the rebuttal.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1/n (best paper in your stack) and n/n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    4



back to top