Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Jihun Yoon, SeulGi Hong, Seungbum Hong, Jiwon Lee, Soyeon Shin, Bokyung Park, Nakjun Sung, Hayeong Yu, Sungjae Kim, SungHyun Park, Woo Jin Hyung, Min-Kook Choi

Abstract

The previous image synthesis research for surgical vision had limited results for real-world applications with simple simulators, including only a few organs and surgical tools and outdated segmentation models to evaluate the quality of the image. Furthermore, none of the research released complete datasets to the public enabling the open research. Therefore, we release a new dataset to encourage further study and provide novel methods with extensive experiments for surgical scene segmentation using semantic image synthesis with a more complex virtual surgery environment. First, we created three cross-validation sets of real image data considering demographic and clinical information from 40 cases of real surgical videos of gastrectomy with the da Vinci Surgical System (dVSS). Second, we created a virtual surgery environment in the Unity engine with five organs from real patient CT data and 22 the da Vinci surgical instruments from actual measurements. Third, We converted this environment photo-realistically with representative semantic image synthesis models, SEAN and SPADE. Lastly, we evaluated it with various state-of-the-art instance and semantic segmentation models. We succeeded in highly improving our segmentation models with the help of synthetic training data. More methods, statistics, and visualizations on https://sisvse.github.io/.

Link to paper

DOI: https://link.springer.com/chapter/10.1007/978-3-031-16449-1_53

SharedIt: https://rdcu.be/cVRXr

Link to the code repository

https://sisvse.github.io/

Link to the dataset(s)

https://sisvse.github.io/


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper performs extensive sets of experiments for surgical scene segmentation with real and synthetic images. Synthetics images are generated from an advanced surgery scene simulator. A large-scale segmentation dataset is also released.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Extensive experiments on instance and semantic segmentation with different models and different combinations of real/synthetic data;
    • More sophisticated surgery simulator with more organs and tools;
    • The first large-scale dataset for surgical scene segmentation, also tackling class imbalance, is released.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I don’t have any complaints.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    It looks reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    Plots could be more interpretable than the tables but I guess it might be a bit tricky to convert them into plots.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    6

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The merits of the paper include introducing a more complex surgery simulation, generated synthetic images from it and real dataset with more classes of organs and surgical tools. Extensive experiments with different combinations of dataset, e.g. real, synthetic, real + synthetic, etc, were performed, comparing against several previous methods for instance and semantic segmentation. Moreover, the new public dataset, which also addresses class imbalance problem, made the paper quite strong.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    2

  • Reviewer confidence

    Confident but not absolutely certain

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #2

  • Please describe the contribution of the paper
    1. This paper propose a surgical image sythesis pipeline, which contains a complex virtual surgery environment, class-balanced frame sampling, domain randomization and semantic image sythesis.

    2. The authors contribute a large-scale surgical image segmentation dataset with both real and sythetic images, which can be used for visual object recognition and image-to-image translation research for gastrectomy with the dVSS.

    3. The effects of synthetic data between tasks, models, and data are analyzed with extensive experiments.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This work demonstrate the feasibility and effectiveness of surgical image sythesis based on complex virtual surgery simulation, which is valuable and inspiring for the surgical vision field.

    2. Systematical experiments are carried out with various configurations and different models.

    3. The dataset will be released to public, and can be used for both semantic and instance segmentation.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The paper is technical and experimental. The scientific insights and methodological innovation are kind of lacking.

    2. The paper is wordy and a little disorganized. Better figures and clear layout are expected. Section 3 could be extended for more informations.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The dataset will be released to public on their website.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html
    1. The first figure on https://sisvse.github.io/ is much more informative than Fig. 1 in the paper.

    2. A table or paragraph introducing the main features of the dataset could be added, for example, total image numbers, organ types, instrument types, min/max instrument number in a frame, etc. Thus, the readers can understand the value of your contribution.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    5

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This work is practical and solid. A novel large dataset is built and will be released to public. My major concerns include:

    1. What are the most important insights from surgical synthetic data generation and application? The authors should emphasize this. This paper lacks theoretical contents. So the experimental insights are required.

    2. In Table 2, the improvements given by synthetic data seem not significant. In Table 3, the relative mIOU increments are very significant. How do the authors explain the differences between the benifits for overall performance and class-wise performance.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    3

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A



Review #3

  • Please describe the contribution of the paper

    The authors propose a synthetic data generation framework for generating synthetic data for minimally invasive surgical scene segmentation. The scene is rendered in the Unity engine as segmentation mask from which photorealistic images are generated using two different semantic image synthesis GAN-based methods. All related datasets will be published. A comparison of state-of-the-art segmentation models is performed.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The datasets and synthetic data generation pipeline could be valuable for the community. The presented approach for advanced training of surgical scene segmentation models is interesting and promising and the implementation and considerations of the authors for dataset generation are thorough.

    The authors compare state-of-the-art models (ResNeSt, Swin Transformer,…) on the proposed datasets and perform an extensive evaluation.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    It’s not clear where the authors see the novelty of the presented work. Is it the datasets, the synthetic generation method, or both?

    The authors use out-of-the-box methods for semantic image synthesis and segmentation. The novelty of the work is limited to the dataset and the synthetic pipeline which could be valuable for further research and other clinical applications. There, the authors should publish the synthetic data generation method which could be transferred to other applications.

    For models that perform already very well, the generated synthetic data does not improve the results.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors promise to publish the real and synthetic datasets, as well as the baseline segmentation models for reproducibility. However, a fully functioning published version of the synthetic data generation pipeline would be valuable for the community to transfer the method to other domains and clinical applications

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://conferences.miccai.org/2022/en/REVIEWER-GUIDELINES.html

    What does X and O mean in table 1? A better notation could improve the clarity of the table.

    Why do the authors choose to do class-balanced frame sampling? Surgical scene segmentation is an imbalanced problem by nature and data imbalance can be handled by the framework (focal loss, data augmentation, …). Class-balanced frame sampling throws away a significant part of the data that could be used to improve the results. This should be at least discussed in the paper.

    The authors should not only publish the datasets but also the synthetic data generation pipeline which could be transferred to other (medical / surgical) applications.

    Were the medical professionals trained before creating the manual virtual synthetic data in unity? How do the authors make sure that the simulated execution resembles reality?

    The tables on page 7 are cluttered, maybe it would be helpful to only present the most relevant results in the paper.

  • Rate the paper on a scale of 1-8, 8 being the strongest (8-5: accept; 4-1: reject). Spreading the score helps create a distribution for decision-making

    4

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The novelty and contribution of the presented work mainly lies in the published dataset which might be not as competitive in comparison with other MICCAI submissions.

  • Number of papers in your stack

    5

  • What is the ranking of this paper in your review stack?

    4

  • Reviewer confidence

    Very confident

  • [Post rebuttal] After reading the author’s rebuttal, state your overall opinion of the paper if it has been changed

    N/A

  • [Post rebuttal] Please justify your decision

    N/A




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The authors propose a synthetic data generation framework for generating synthetic data for minimally invasive surgical scene segmentation. The scene is then rendered in the Unity engine as a segmentation mask from which photorealistic images are generated using image synthesis GAN-based methods. Furthermore, a comparison of state-of-the-art segmentation models (ResNeSt, Swin Transformer, etc.) is performed.

    All reviewers agree the importance of the work and particularly the value of the fact that the datasets will be made public. For the paper to be accepted the reviewers should answer the concerns of the reviewers in terms of framing the contributions of the papers, adding details about the dataset, and in general addressing the concerns of the reviewers.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers). If this paper is among the bottom 30% of your stack, feel free to use NR (not ranked).

    6




Author Feedback

We really appreciate all reviewers’ efforts to review our work. We notate ‘C#-#’ for each reviewer and comment.

[C2-1] Most important insights from surgical synthetic data generation and application Thank you for a fundamental question. We find two interesting results from the performance analysis, as we mentioned in the Instance segmentation and Class-wise performance section. 1) Synthetic data improves low-performant classes Class-wise performances from Table 3 show significant improvements for all tasks, the instance, and semantic segmentation. Looking at the relative performances, we can see that our method significantly improves low-performant classes. The reason why overall performance improvement is less significant is that high-performant classes lose their performance. However, there is more good than harm because although some classes lose their performances, those are still very high for applications. We will study further to improve low-performant classes with the synthetic dataset while not losing high-performant classes. 2) Synthetic data is very effective for Mask AP Another interesting result is that our method is the most effective for Mask AP(instance segmentation). HTC is an instance segmentation model utilizing a semantic segmentation mask based on Cascade Mask R-CNN(CMR). A comparison between HTC and CMR shows how synthetic masks are helpful for the instance segmentation. On the other hand, semantic segmentation results show minor improvement over the instance segmentation results. We assume real image data performance for the semantic models is too high to improve. We will also publish subset datasets for future research and study further with various amounts of real image data.

[C3-1] Novelties of the presented work between the dataset and the synthetic generation method Thank you for a key question. Our work has more novelties in publishing the dataset than the synthetic generation method. As mentioned in Table 1 and page 2, our dataset novelty is the first large-scale surgical vision dataset for semantic image synthesis, including new real images. We agree that our work is more experimental than theoretical. However, our empirical findings and implementation details of the synthesis models, which we could not include because of page limitation, could supplement this weakness. We will publish the synthesis models and the details on the dataset website or the main script. These will also satisfy your suggestion for the fully functioning synthetic data generation pipeline for other clinical applications.

[C3-2] Purpose of class-balanced frame sampling Thank you for a good question. We agree that the surgical scene segmentation is an imbalanced problem by nature, and the class-imbalance problem originates from few data for some classes and a relativeness between classes. The primary purpose of our sampling is to alleviate the problem by suppressing many redundant classes, and another is to reduce unnecessary data labeling costs. Baseline results from Table 2 and 3 show that the sampling does not harm performance while achieving our goal. Moreover, the number of rarely seen classes is left in its natural state. Looking at Appendix Table 2, there is still the class-imbalance problem. Our dataset can still be used for class-imbalanced research with various losses or data augmentations. We will make this clear in the main script for a camera-ready version.

[C3-3] Performance improvement by synthetic dataset We kindly refer you to the answer above for the [C2-1] comment.

[C3-4] Resemblance of simulated surgeries to reality for the synthetic data Thank you for a good question. Simulating real surgery is still very difficult, and we agree that our simulated surgeries can differ from reality. However, our simulated synthetic dataset already shows performance improvements and potentialities despite the gap. We will keep studying for a more sophisticated simulator in the future.



back to top